WORLD INTELLECTUAL PROPERTY ORGANIZATION 
Intcmaf ionai Bureau 




PCX 

INTERNATIONAL APPLICATION PUBLISHED UNDER THE PATENT COOPERATION TREATY (PCT) 



(51) International Patent Classification ^ : 

C07H 21/04, C07K 1AM), 16/00, A61K 
38A)0, GOIN 33/00, 33/566, AOIN 37/18 



Al 



(11) International Publication Number: WO 98/07736 

(43) International Publication Date: 26 February 1998 (26.02.98) 



(21) International Application Number: PCT/US97/ 14585 

(22) International KilinR Date: 18 August 1997 (18.08.97) 



(30) Priority Data: 

08/699,59 1 
08/753.007 



19 Augast 1996 (19.08,96) 
1 9 November 1 996 ( 1 9. 11 .96) 



US 
ll.S 



(71) Applicant; MILLHNNIUM BIOTHERAPEUTICS. INC. 

[US/US]; 640 Memorial Drive, Cambridge, MA 02139 
(US). 

(72) Inventors; GEARING, David, P.; 23 Slandish Road, Welleslcy, 

MA 02181 (US). BUSFIELD. Samantha. J.; Apartment 1, 
15 Trowbridge Street, Cambridge, MA 02138 (US). 

(74) Agent: MEIKLEJOHN, Anita, L.; Fish & Richardson P.C., 225 
Franklin Street, Boston, MA 02110 (US). 



(81) Designated States: AU, CA, JP, European patent (AT. BE, 
CH. DE, DK. ES, FI, FR, GB. GR, IE, IT, UJ. MC, NL, 
PT, SE). 



Published 

Wilh international search report. 

iieforc the expiration of the time limit for amending the 
claims and to he republished in the event of the receipt of 
amendments. 



(54) Title: DON-1 GENE AND POLYPEPTIDES AND USES THEREFOR 
(57) Abstract 



TIic present invention relates to the identification and characterization of a novel gene called don-\ related to epidermal growth factors 
(EGF) such as the neuregulins, and methods of preparing and using alternate splice forms of this gene to express new Don-l polypeptides. 



FOR THE PURPOSES OF INFORMATION ONLY 



Ccxles used to identify States party to the PCT on the front pages of pamphlets publishing international applications under the PCT. 



AL 


Albania 




Spain 


L.S 


Ltsoiho 


Si 


Slovenia 


AM 


Armenia 


FI 


Finland 


LT 


Lithuanm 


SK 


Slovakia 


AT 


Austria 


re 


iTance 


LU 


Luxembourg 


SN 


Senegal 


AU 


Australia 


GA 


Gabon 


LV 


Latvia 


sz 


Swaziland 


K'L 


Azerhflijan 


GB 


United Kingdom 


MC 


Monaco 


TD 


Cliad 


BA 


Bosnia and Herzegovina 


GE 


Georgia 


MD 


Republic of Moldova 


TG 


Togo 


BB 


Barbados 


GH 


Ghana 


MG 


Madagascar 


IJ 


Tajikistan 


BE 


Belgium 


GN 


Guinea 


MK 


The former Yugoslav 


TM 


Turkmenistan 


BF 


Burkina Paso 


GR 


Greece 




Republic of Macedonia 


TR 


luTkcy 


BG 


Bulgaria 


HU 


Hungary 


ML 


Mali 


TT 


Trinidad and Tobago 


BJ 


Benin 


IE 


Ireland 


MN 


Mongolia 


UA 


Ukraine 


BR 


Brazil 


IL 


Israel 


MR 


Mauritania 


UG 


Uganda 


BY 


Belarus 


IS 


Iceland 


MW 


Malawi 


US 


United States of America 


CA 


Canada 


IT 


Italy 


MX 


Mexico 


L!Z 


Uzbekistan 


CF 


Central African Republic 


JP 


Japan 


NE 


Niger 


VN 


Viet Nam 


CG 


Congo 


KE 


Kenya 


NL 


Netherlands 


YIJ 


Yugoslavia 


CH 


Swiizcrland 


KG 


Kyrgyzstan 


NO 


Norway 


ZW 


Zimbabwe 


CI 


Cfttc d'ivoirc 


KP 


Democratic People s 


NZ 


New Zealand 






CM 


Cameroon 




Republic of Korea 


PL 


Poiard 






CN 


China 


KK 


Republic of Korea 


PT 


Portugal 






CU 


Cuba 


KZ 


Kazaksian 


RO 


Romania 






cz, 


Cicch Republic 


I.C 


Saint Lucia 


Rt 


Russian I-edcraiion 






l>E 


Germany 




Liechtenstein 


SD 


Sudan 






OK 


Dcnmari 


LK 


Sri Lanka 


SE 


Sweden 






EE 


Ksionia 


LR 


Liberia 


SG 


Sinj;aporc 







wo 98/07736 



PCT/US97/14585 



DON-1 GENE AND POLYPEPTIDES AND USES THEREFOR 
Background of the Invention 
This invention relates to a new gene, called don- 
5 1, related to growth factors such as the neuregulins, and 
methods of preparing and using alternate splice forms of 
this gene to express new Don-1 polypeptides. The 
invention also relates to the use of these new genes and 
corresponding polypeptides. 

10 The growth, differentiation, and survival of many 

cell types depends on the binding of protein ligands to 
specific cell surface receptors. Misregulation of this 
interaction has been implicated in a wide variety of 
tumors and developmental irregularities. For example, 

15 the epidermal growth factor receptor (EGFR) family of 
receptor-type tyrosine kinases are frequently 
overexpressed, mutated, or deleted in carcinomas of the 
breast, lung, ovary, brain, and gastrointestinal tract 
(Prignent et al.. Prog. Growth Factor Res., 4:1-24, 

20 1992) . This family of receptors, which includes 

receptors referred to as EGFR, erbB2 (also called "neu" 
or HER2, the human homolog of erbB2) , erbB3 (HER3) , and 
erbB4 (HER4), respectively, may play an important role in 
the modulation of tumor growth and progression. In 

25 particular, it has been shown in several studies that 
overexpression of erbB2 in a variety of human 
adenocarcinomas, e.g., in breast and ovarian cancer, 
correlates with a poor prognosis (see, e.g. , Slamon et 
al., Science, 235:177-182, 1987). 

30 One group of ligands that bind to this family of 

receptors is referred to as the neuregulin family of 
ligands, which all share a common structural domain known 
as an EGF motif that contains six cysteines. This motif 
not only allows these ligands to bind to the receptors, 

35 but to mediate biological effects as well (Barbacci et 
al, J. Biol. Chem., 270:9585-9589, 1995)). Although 
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there appear to be multiple ligands capable of binding to 
and activating members of the EGFR family, the growth 
factors that bind to and activate the other members of 
this receptor family, erbB2, erbB3, and erbB4 , are less 
5 well characterized. 

Neuregulins are also referred to as neu 
differentiation factors (NDF) , glial growth factors 
(GGF) , heregulins, and acetylcholine-receptor-inducing 
activity (ARIA) ligands, all of which are expressed as 

10 variant splice forms of a single gene. These different 
names reflect the diverse biological activities of the 
neuregulins in vitro, as glial cell mitogens, receptor 
binding proteins, mammary differentiation factors, and 
muscle trophic factors, 

15 Each of the neuregulin glycoproteins has been 

shown to activate one or more of the receptors erbB2 , 
erbB3 , and erbB4 (for a review, see Ben-Baruch et al*, 
Proc. Soc. Exp, Biol. Med., 206:221-227, 1994). These 
factors were first purified on the basis of their ability 

20 to activate, i.e., cause phosphorylation of, the erbB2 
receptor, although it has been shown subsequently that 
these factors do not bind erbB2 directly (Tzahar et al., 
J. Biol. Chem., 269:25226-25233, 1994). In addition, it 
has been shown that NDF causes the differentiation of 

25 human mammary tumor cells (Peles et al.. Cell, 69:559- 
572, 1992) . 

Summary of the Invention 
The present invention relates to the 
identification and characterization of a new gene, 
30 referred to as don-1, and alternate splice variants of 
don-1, which are related to the neuregulin gene family. 
The invention also relates to the polypeptides encoded by 
don-1. Don-1 mRNA transcripts were expressed in various 
tissues including murine brain, spleen, and lung, and 
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human fetal brain and fetal lung. No Don-1 transcripts 
were detected in normal adult human tissues; however, 
Don-1 transcripts were detected in several human 
carcinoma cells. In each case, message sizes were about 
5 3.0 kb and 4.4 kb (human) and 4.0 kb (murine). 

Both murine and human cDNAs corresponding to 
various splice variants of don-1 have been cloned- A 
murine cDNA corresponding to a first splice variant of 
this gene is represented by SEQ ID N0:1, and the amino 

10 acid sequence of the polypeptide it encodes is 

represented by SEQ ID NO: 2, which is a membrane-bound 
polypeptide approximately 605 amino acids in length (Fig. 
1) . A second murine cDNA corresponding to a second 
splice variant of the don-1 gene is represented by SEQ ID 

15 NO: 3, and the amino acid sequence of the polypeptide it 
encodes is represented by SEQ ID NO: 4, which is a 
secreted polypeptide about 181 amino acids in length 
(Fig. 2). 

A human cDNA corresponding to a first splice 

20 variant of the human don-1 gene is represented by SEQ ID 
NO: 5, and the amino acid sequence of the polypeptide it 
encodes is represented by SEQ ID NO: 6, which is a 
membrane-bound polypeptide approximately 4 07 amino acids 
in length (Fig. 3) . A second human cDNA corresponding to 

25 a second splice variant of the human don-i gene is 

represented by SEQ ID NO: 7, and the amino acid sequence 
of the polypeptide it encodes is represented by SEQ ID 
NO: 8, which is a membrane-bound polypeptide of about 469 
amino acids in length (Fig. 4) . 

30 A third human cDNA corresponding to a third splice 

variant of the human don-1 gene was isolated by further 
screening of a human fetal lung library. This sequence 
had an extended sequence compared to the first two 
clones, and included a termination codon. This sequence 

35 is represented by SEQ ID NO: 31, and the amino acid 
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sequence of the polypeptide it encodes is represented by 
SEQ ID NO: 32, which is a membrane-bound polypeptide of 
about 647 amino acids in length (Fig. 7). This sequence 
appears to be an extended version of the second splice 
5 variant (SEQ ID NO; 8), although three amino acids differ 
at the 3' end of SEQ ID NO: 32. This third splice variant 
extends a further 178 amino acids compared to the second 
human splice variant, and is 94% homologous to murine 
Don-1 (SEQ ID NO: 2) over this region. 

10 In addition, the invention relates to methods of 

obtaining additional novel ligands that activate some or 
all members of the EGF receptor family of receptor-type 
tyrosine kinases, and methods of treating and diagnosing 
cell proliferative diseases, 

15 In general, the invention features an isolated 

nucleic acid which encodes a mammalian Don-1 polypeptide, 
e.g. , a polypeptide encoded by any splice variant of a 
don-1 gene. For example, the nucleic acid can include 
all or a portion of the nucleotide sequence of, e.g., 

20 Fig. 1, SEQ ID NO:l (murine). Fig. 2, SEQ ID NO: 3 

(murine), Fig. 3, SEQ ID N0:5 (human). Fig. 4, SEQ ID 
N0:7 (human), Fig, 7, SEQ ID N0:31 (human), the sequence 
encoding the epidermal growth factor (EGF) domain of Don- 
1 having SEQ ID NO: 11, or the extracellular domain of 

25 Don-1. 

The term "nucleic acid" encompasses both RNA and 
DNA, including cDNA, genomic DNA, and synthetic (e.g., 
chemically synthesized) DNA. The nucleic acid may be 
double-stranded or single-stranded. Where single- 
so stranded, the nucleic acid may be a sense strand or an 
antisense strand. 

By "isolated nucleic acid" is meant a DNA or RNA 
that is not immediately contiguous with both of the 
coding sequences with which it is immediately contiguous 
35 (one on the 5' end and one on the 3' end) in the 
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naturally occurring genome of the organism from which it 
is derived. Thus, in one embodiment, an isolated nucleic 
acid includes some or all of the 5' non-coding (e.g., 
promoter) sequences which are immediately contiguous to 
5 the coding sequence. The term therefore includes, for 
example, a recombinant DNA which is incorporated into a 
vector, into an autonomously replicating plasmid or 
virus, or into the genomic DNA of a prokaryote or 
eukaryote, or which exists as a separate molecule (e.g., 

10 a cDNA or a genomic DNA fragment produced by PCR or 

restriction endonuclease treatment) independent of other 
sequences. It also includes a recombinant DNA which is 
part of a hybrid gene encoding additional polypeptide 
sequence. The term "isolated" as used herein also refers 

15 to a nucleic acid or peptide that is substantially free 
of cellular material, viral material, or culture medium 
when produced by recombinant DNA techniques, or chemical 
precursors or other chemicals when chemically 
synthesized. Moreover, an "isolated nucleic acid" is 

20 meant to include nucleic acid fragments which are not 
naturally occurring as fragments and would not be found 
in the natural state. 

A nucleic acid sequence that is "substantially 
identical" to a don-1 nucleotide sequence is at least 80% 

25 or 85%, preferably 90%, and more preferably 95% or more 
(e.g. 99%) identical to the nucleotide sequence of the 
human don-1 cDNA of SEQ ID NO: 5, NO: 7, or NO: 31, or the 
murine don-1 cDNA of SEQ ID N0:1 or NO: 3. For purposes 
of comparison of nucleic acids, the length of the 

30 reference nucleic acid sequence will generally be at 

least 40 nucleotides, preferably at least 60 nucleotides, 
more preferably at least 75 to 110, or more nucleotides. 

Sequence identity can be measured using sequence 
analysis software (e.g.. Sequence Analysis Software 

35 Package of the Genetics Computer Group, University of 
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Wisconsin Biotechnology Center, 1710 University Avenue, 
Madison, WI 53705) • 

The invention also encompasses nucleic acid 
sequences that encode forms of Don-1 in which naturally 
5 occurring amino acid sequences are altered or deleted. 

The invention also features isolated nucleic acid 
sequences that encode one or more portions or domains of 
Don-1, including but not limited to the Ig domain, the TM 
domain, the extracellular domain, the cytoplasmic domain, 

10 and various functional domains of Don-1, such as the EGF 
domain. The nucleic acids also include those of the don- 
1 gene contained in A-T.C.C. deposit numbers 98096, 
98097, or 98098, 

Preferred nucleic acids encode polypeptides that 

15 are soluble under normal physiological conditions. Also 
within the invention are nucleic acids encoding fusion 
proteins in which a portion of Don-1 (e.g., one or more 
domains) is fused to an unrelated protein or polypeptide 
(e.g. , a marker polypeptide or a fusion partner) to 

20 create a fusion protein. For example, the polypeptide 
can be fused to a hexa-histidine tag to facilitate 
purification of bacterially expressed protein, or to a 
hemagglutinin tag to facilitate purification of protein 
expressed in eukaryotic cells. 

25 The fusion partner can be, for example, a 

polypeptide which facilitates secretion, e.g., a 
secretory sequence. Such a fused protein is typically 
referred to as a preprotein. The secretory sequence can 
be cleaved by the host cell to form the mature protein. 

30 Also within the invention are nucleic acids that encode 
mature Don-1 fused to a polypeptide sequence to produce 
an inactive proprotein. Proproteins can be converted 
into the active form of the protein by removal of the 
inactivating sequence. 
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The nucleic acids further include nucleic acids 
that hybridize, e.g., under stringent hybridization 
conditions (as defined herein) , to all or a portion 
(e.g., the TM or EGF domains) of the nucleotide sequence 
5 of SEQ ID NO:l, 3, 5, 7, 31, or its complement, or to the 
nucleotide sequence of the don~l gene contained in 
A.T.C.C. deposit 98096, 98097, or 98098, e.g., nucleic 
acids that encode polypeptides that activates receptor- 
type tyrosine kinases that have a molecular weight of 

10 about 185 kDa. 

The hybridizing portion of the hybridizing nucleic 
acids are preferably 20, 30, 50, or 70 bases long. 
Preferably, the hybridizing portion of the hybridizing 
nucleic acid is 80%, more preferably 95%, or even 98% 

15 identical to the sequence of a portion or all of a 

nucleic acid encoding a Don-1 polypeptide. Hybridizing 
nucleic acids of the type described above can be used as 
a cloning probe, a primer (e.g., a PGR primer), or a 
diagnostic probe. Preferred hybridizing nucleic acids 

20 encode a polypeptide having some or all of the biological 
activities possessed by a naturally-occurring Don-1 
polypeptide, e.g., as determined in the pl85 assay 
described below- 
Hybridizing nucleic acids can be additional splice 

25 variants of the don-1 gene. Thus, they may encode a 
protein which is shorter or longer than the different 
forms of Don-1 described herein. Hybridizing nucleic 
acids may also encode proteins that are related to Don-1 
(e.g, proteins encoded by genes which include a portion 

30 having a relatively high degree of identity to the don-1 
gene described herein) . 

In another embodiment, the invention features 
cells, e.g., transformed host cells, harboring a nucleic 
acid encompassed by the invention. By "transformed cell" 

35 is meant a cell into which (or into an ancestor of which) 
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has been introduced, by means of recombinant DNA 
techniques, a DNA molecule encoding a Don-1 polypeptide. 

The invention also features vectors and plasmids 
that include a nucleic acid of the invention which is 
5 operably linked to a transcription and/or translation 
sequence to enable expression, e,g, , expression vectors. 
By "operably linked" is meant that a selected nucleic 
acid, e.g., a DNA molecule encoding a Don-1 polypeptide, 
is positioned adjacent to one or more sequence elements, 

10 e.g., a promoter, which direct transcription and/or 
translation of the sequence such that the sequence 
elements can control transcription and/or translation of 
the selected nucleic acid. 

The invention also features purified or isolated 

15 Don-1 polypeptides. As used herein, both "protein" and 
"polypeptide" mean any chain of amino acids, regardless 
of length or post-translational modification (e.g. , 
glycosylation or phosphorylation) . Thus, the term "Don-1 
polypeptide" (or Don-1) includes full-length, naturally 

20 occurring Don-1 protein, as well as recombinantly or 

synthetically produced polypeptides that correspond to a 
full-length, naturally occurring Don-1 protein or to 
particular domains or portions of a naturally occurring 
protein. 

25 By a "purified" or "isolated" compound is meant a 

composition which is at least 60% by weight (dry weight) 
the compound of interest, e.g., a Don-1 polypeptide or 
antibody. Preferably the preparation is at least 75%, 
more preferably at least 90%, and most preferably at 

30 least 99%, by weight the compound of interest. Purity 

can be measured by any appropriate standard method, e.g. , 
column chromatography, polyacrylamide gel 
electrophoresis, or HPLC analysis. 

Preferred Don-1 polypeptides include a sequence 

35 substantially identical to all or a portion of a 
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naturally occurring Don-1 polypeptide, e.g., including 
all or a portion of the human sequence shown in Fig. 3 
(SEQ ID N0:6), Fig. 4 (SEQ ID N0:8), or Fig. 7 (SEQ ID 
NO: 32), or the murine sequence shown in Fig. 1 (SEQ ID 
5 N0:2) or Fig. 3 (SEQ ID N0:6). Polypeptides 

"substantially identical" to the Don-l polypeptide 
sequences described herein have an amino acid sequence 
that is at least 80% or 85%, preferably 90%, and more 
preferably 95% or more (e.g. 99%) identical to the amino 

10 acid sequence of the Don-l polypeptides of SEQ ID N0s:2, 
4, 6, or 8. For purposes of comparison, the length of 
the reference Don-l polypeptide sequence will generally 
be at least 16 amino acids, preferably at least 20 amino 
acids, more preferably at least 25 amino acids, and most 

15 preferably 35 amino acids. 

In the case of polypeptide sequences which are 
less than 100% identical to a reference sequence, the 
non-identical positions are preferably, but not 
necessarily, conservative substitutions for the reference 

20 sequence. Conservative substitutions typically include 
substitutions within the following groups: glycine and 
alanine; valine, isoleucine, and leucine; aspartic acid 
and glutamic acid; asparagine and glutamine; serine and 
threonine; lysine and arginine; and phenylalanine and 

25 tyrosine - 

Where a particular polypeptide is said to have a 
specific percent identity to a reference polypeptide of a 
defined length, the percent identity is relative to the 
reference peptide. Thus, a peptide that is 50% identical 
30 to a reference polypeptide that is 100 amino acids long 
can be a 50 amino acid polypeptide that is completely 
identical to a 50 amino acid long portion of the 
reference polypeptide. It also might be a 100 amino acid 
long polypeptide which is 50% identical to the reference 



wo 98/07736 



PCT/US97/14585 



- 10 - 

polypeptide over its entire length- Of course, many 
other polypeptides will meet the same criteria. 

The polypeptides of the invention include, but are 
not limited to: recombinant polypeptides, natural 
5 polypeptides, and synthetic polypeptides as well as 
polypeptides, which are preproteins or proproteins. 

Polypeptides identical or substantially identical 
to one or more domains of human, murine, or other 
mammalian Don-1, e,g, , the EGF domain (e.g., SEQ ID 

10 NO: 11) (about amino acid 142 to about amino acid 178 of 
human Don-1 cDNA SEQ ID N0s:8 and 32, or amino acids 104 
to 140 of human Don-1 cDNA SEQ ID NO: 6 described herein), 
or the transmembrane (TM) domain (e.g., SEQ ID 
NO: 20) (about amino acid 203 to about amino acid 225 of 

15 human Don-1 cDNA SEQ ID NOs : 8 and 32, or amino acids 173 
to 195 of human Don-l cDNA SEQ ID NO: 6 described herein), 
are also within the scope of the invention. 

Polypeptides encoded by the don-l gene contained 
in A.T.C.C. deposit 98096, 98097, or 98098 are also 

20 included within the invention. 

Preferred polypeptides are those which are soluble 
under normal physiological conditions. Also within the 
invention are soluble fusion proteins in which a full- 
length form of Don-1 or a portion (e.g., one or more 

25 domains) thereof is fused to an unrelated protein or 

polypeptide (i.e., a fusion partner) to create a fusion 
protein. 

The invention also features isolated polypeptides 
(and the nucleic acids that encode these polypeptides) 
30 that include a first portion and a second portion; the 
first portion includes a Don-1 polypeptide, e.g., the 
epidermal growth factor (EGF) domain of Don-1, and the 
second portion includes an immunoglobulin constant (Fc) 
region or a detectable marker. 
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In addition, the invention features a 
pharmaceutical composition which includes a Don-1 
polypeptide and a physiologically acceptable or inert 
carrier, such as saline. 
5 The invention also features purified or isolated 

antibodies that specifically bind to a Don-1 polypeptide, 
or a specific region or domain of a naturally occurring 
Don-1 protein. By "specifically binds" is meant an 
antibody that recognizes and binds to a particular 

10 antigen, e.g., a Don-1 polypeptide, but which does not 
substantially recognize and bind to other molecules in a 
sample, e.g., a biological sample, which naturally 
includes Don-1. In a preferred embodiment the antibody 
is a monoclonal antibody. 

15 The invention also features antagonists and 

agonists of Don-1. Antagonists can inhibit one or more 
of the functions of Don-1. Suitable antagonists include 
large or small molecules, antibodies to Don-1, and Don-1 
polypeptides which compete with a native form of Don-1. 

20 Agonists of Don-1 enhance or facilitate one or more of 
the functions of Don-1. Suitable agonists include, for 
example, large or small molecules and anti-idiotype 
antibodies that mimic the biological effects of Don-1. 
Also within the invention are nucleic acid 

2 5 molecules that can be used to interfere with Don-1 
expression, e.g., antisense molecules and ribozymes. 

In another aspect, the invention features a method 
for detecting a Don-1 polypeptide. This method includes: 
obtaining a biological sample; contacting the sample with 

30 an antibody, that specifically binds a Don-1 polypeptide, 
under conditions that allow the formation of Don-l- 
antibody complexes; and detecting the complexes, if any, 
as an indication of the presence of Don-1 in the 
biological sample . 
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In another aspect, the invention features a method 
for stimulating proliferation of a cell, by administering 
to the cell an amount of a Don-1 polypeptide effective to 
stimulate proliferation of the cell. The invention also 
5 features a method for decreasing proliferation of a cell, 
by administering to the cell an amount of a Don-1 
polypeptide inhibitor effective to decrease proliferation 
of the cell. This method can be used to treat tumors, 
e.g., adenocarcinomas, caused by the over-proliferation 

10 of cells in a patient. Preferably the inhibitor is an 
antibody which selectively binds to Don-1. 

In another embodiment, the invention features a 
method of obtaining a splice variant cDNA of the don-l 
gene. The method includes the steps of obtaining a 

15 labeled probe comprising an isolated nucleic acid that 
encodes all or a portion of the epidermal growth factor 
(EGF) domain of Don-1, e.g., having the amino acid 
sequence of SEQ ID NO: 11; screening a nucleic acid 
fragment library with the labeled probe under conditions 

20 that allow hybridization of the probe to nucleic acid 
fragments in the library to form nucleic acid duplexes, 
isolating labeled duplexes, if any; and preparing a full- 
length cDNA from the fragments in any labeled duplex to 
obtain a splice variant cDNA of the don^l gene. 

2 5 The invention further features a method of 

obtaining a gene related to the don-1 gene, by obtaining 
a labeled probe comprising an isolated nucleic acid that 
encodes all or a portion of the transmembrane (TM) domain 
of Don-1, e.g., having the amino acid sequence of SEQ ID 

30 NO: 20; screening a nucleic acid fragment library with the 
labeled probe under conditions that allow hybridization 
of the probe to nucleic acid fragments in the library to 
form nucleic acid duplexes; isolating labeled duplexes, 
if any; and preparing a full-length gene sequence from 
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the nucleic acid fragments in any labeled duplex to 
obtain a gene related to the don-1 gene. 

The invention also features a purified protein 
that functionally interacts with Don-1, and a nucleic 
5 acid that encodes a protein that functionally interacts 
with Don-1. 

Unless otherwise defined, all technical and 
scientific terms used herein have the same meaning as 
commonly understood by one of ordinary skill in the art 

10 to which this invention belongs. Although methods and 
materials similar or equivalent to those described herein 
can be used in the practice or testing of the present 
invention, the preferred methods and materials are 
described herein. All publications, patent applications, 

15 patents, and other references mentioned herein are 

incorporated by reference in their entirety. In the case 
of conflict, the present specification, including 
definitions, will control. In addition, the materials, 
methods, and examples are illustrative only and are not 

20 intended to be limiting 

Other features and advantages of the invention 
will be apparent from the following detailed 
descriptions, and from the claims. 

Brief Description of the Drawings 
25 Fig. 1 is a representation of the nucleic acid 

(SEQ ID N0:1) of a murine cDNA corresponding to a 
membrane-bound splice variant of the don-1 gene, and the 
amino acid sequence (SEQ ID NO: 2) it encodes. 

Fig. 2 is a representation of the nucleic acid 
30 (SEQ ID NO: 3) of a second murine cDNA corresponding to a 
secreted splice variant of the don-1 gene, and the amino 
acid sequence (SEQ ID NO: 4) it encodes. 

Fig. 3 is a representation of the nucleic acid 
(SEQ ID NO: 5) of a human cDNA corresponding to a 
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membrane-bound splice variant of the human don-2 gene, 
and the amino acid sequence (SEQ ID NO: 6) it encodes. 

Fig. 4 is a representation of the nucleic acid 
(SEQ ID NO: 7) of a human cDNA corresponding to a second 
5 splice variant of the human don-i gene, and the amino 
acid sequence (SEQ ID NO: 8) it encodes. 

Fig. 5 is a multi-sequence alignment of the amino 
acid SEQ ID N0s:2, 4, 6, and 8 of Figs. 1 to 4, as well 
as the amino acid sequence of rat neu differentiation 

10 factor (NDF) (Genbank Accession No. A38220; SEQ ID NO: 9) 
and human heregulin-/? (Genbank Accession No. B43273; SEQ 
ID N0:10). In this figure, an asterisk above the aligned 
sequences indicates the location of conserved cysteines 
in the EGF domain. The transmembrane domains are boxed. 

15 Fig. 6 is a representation of a sequence alignment 

of the EGF domain of Don-l (SEQ ID NO: 11) with the growth 
factor domains of members of the neuregulin/heregulin 
family and human heparin binding-EGF (hb-EGF) . The 
domain is bounded by cysteines, and contains a total of 

20 six conserved cysteines. Fig. 6 shows additional amino 
acids upstream and downstream of the EGF domain. Amino 
acid sequences correspond to a Don-l EGF polypeptide (SEQ 
ID NOtll), human heregulin-a (Genbank Accession No. 
A43273, SEQ ID N0:12), rat NDF (Genbank Accession No. 

25 A38220; SEQ ID NO:13), human heregulin-^1 (Genbank 
Accession No. A43273; SEQ ID N0:14), chicken ARIA 
(Genbank Accession No. A45769; SEQ ID N0:15); human 
heparin binding-EGF (Genbank Accession No. A38432; SEQ ID 
N0:1€); human EGF (Genbank Accession No. P01133; SEQ ID 

30 NO: 17); human amphiregulin (Genbank Accession No. 179040; 
SEQ ID NO: 18); and human TGF-a (Genbank Accession No. 
339546; SEQ ID NO:19). 

Fig. 7 is a representation of the nucleic acid 
(SEQ ID NO: 31) of a human cDNA corresponding to a third 
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splice variant of the human do/i-I gene, and the amino 
acid sequence {SEQ ID NO: 32) it encodes. 

Detailed Description 
Don-l polypeptides, described here for the first 
5 time, are a family of novel glycoprotein ligands related 
to epidermal growth factors such as the neuregulins. The 
different Don-l polypeptides are encoded by different 
splice variants of the don-l gene, Don-l plays a role in 
proliferation of carcinomas including adenocarcinoma, 

10 myeloma, glioma, melanomas, as well as in cell 
differentiation, proliferation, and survival. 

Don-l polypeptides have a mosaic grouping of 
functional domains similar to those found in neuregulins 
(Wen et al, , Cell, 69, 559-572, 1992). For example, 

15 similar to NDF, both secreted and membrane-bound forms of 
Don-l polypeptides include an EGF domain, which enables 
these ligands to bind to EGF receptors, and to mediate 
biological effects- As described herein, the EGF domain 
can also be used to obtain additional splice variants of 

20 the don-l gene. 

Also liXe NDF, membrane-bound forms of Don-l (SEQ 
ID N0s:2, 6, 8, and 32) contain a recognized Ig domain, a 
transmembrane (TM) domain (VLTITGICVALLWGIVCWAYC, SEQ 
ID NO:20), and a cytoplasmic domain. The Ig domain 

25 should be important in protein-protein interactions. As 
described herein, the TM domain can be used to obtain 
additional new genes related to the don-l gene. A 
secreted form of murine Don-l (SEQ ID NO: 4) is a variant 
splice form that lacks the transmembrane sequence. These 

30 domains are described in detail below. 

As shown in Fig. 5, comparison of a sequence of a 
human cDNA of Don-l (SEQ ID NO: 8) isolated from human 
fetal brain, revealed that the EGF domain (about amino 
acid 142 to about amino acid 178) is 100% identical to 
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the EGF domain in the mouse Don-1 amino acid sequence of 
SEQ ID NO:2 (about amino acids 104 to 140). In addition, 
the TM domains (boxed in Fig. 5) appear to be highly 
conserved between mouse and human Don-1 (identical; SEQ 
5 ID N0:20), and between Don-1, NDF, and heregulin (2 
differences of 23 amino acids) . The generic TM domain 
sequence is VLTITGICXjALLVVGIXjCVVAYC (SEQ ID NO: 21), 
where is I or V, and Xj is M or V, 

The two neighboring basic amino acids adjacent the 

10 transmembrane region (amino acids Lys-171 and Arg-172 in 
the human SEQ ID NO: 6; amino acids Lys-201 and Arg-202 in 
the human SEQ ID NOs : 8 and 32; amino acids Lys-163 and 
Arg-164 in the murine form SEQ ID NO: 2) provide for the 
possibility of processing these proteins with proteolytic 

15 enzymes to detach them from the cell membrane. 

Fig. 5 shows the primary structure of both murine 
and human forms of Don-1 (SEQ ID N0s:2, 4, 6, and B) , as 
well as the primary structures of rat NDF (SEQ ID N0:9), 
human heregulin-^ (SEQ ID NO: 10). As can be seen from 

20 this figure, these sequences have highly conserved Ig, 
EGF (extracellular) and TM domains. Further, there is 
high homology in the cytoplasmic domains. 

Expression of Don-1 in human tissues appeared to 
be restricted to fetal brain and lung tissues. No Don-1 

25 transcripts were detected in normal adult human tissues 
using a murine Don-1 cDNA as a probe. However, Don-1 
transcripts were detected in a human colon adenocarcinoma 
cell line SW480 and in a human melanoma cell line G361. 
In these tissues there were two major Don-1 transcripts 

30 of about 4.4 kb and about 3 kb each. 

Overall, the human Don-1 cDNA of SEQ ID NO: 8 
described herein is 95% identical and 98% similar (based 
on conservative substitutions) at the amino acid level to 
the murine Don-1 cDNA of SEQ ID NO: 2 described herein. 

35 The highest homology between the two forms is found in 
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the EGF and transmembrane domains, suggesting that both 
domains have important functional roles. High homology 
between the two forms is also found in the Ig and 
cytoplasmic domains . 

5 Don-1 Proteins and Polypeptides 

Don-1 proteins and polypeptides and Don-1 fusion 
proteins can be prepared for a wide range of uses 
including, but not limited to, generation of antibodies, 
preparation of reagents for diagnostic assays, 

10 identification of other molecules involved in neoplastic 
and proliferation (particularly adenocarcinoma) , 
preparation of reagents for use in screening assays for 
neoplasm modulators, and preparation of therapeutic 
agents for treatment of tumor-related disorders. 

15 The doj2-i gene was originally isolated from a 

screen of a murine choroid plexus cDNA library. Further 
screening of other murine and human tissue sources 
yielded three additional clones of this gene, all 
representing different splice variants. Based on these 

20 cDNA sequences, the don-1 gene can also be obtained by 
chemical synthesis using one of the methods described in 
Engels et al. (Agnew. Chejn. Int. Ed. Engl., 28:716-734, 
1989) . These methods include triester, phosphite, 
phosphoramidite and H-Phosphonate methods, PGR and other 

25 autoprimer methods, and oligonucleotide syntheses on 

solid supports* These methods may be used if the entire 
nucleic acid sequence of the gene is known, or the 
sequence of the nucleic acid complementary to the coding 
strand is available, or alternatively, if the target 

30 amino acid sequence is known, one may infer potential 
nucleic acid sequences using known and preferred coding 
residues for each amino acid residue. 

In particular. Fig. 1 shows the cDNA of one murine 
splice variant of don-1 (SEQ ID N0:1), which encodes a 
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predicted protein of about 605 amino acids (SEQ ID N0:2). 
This clone was isolated from a murine lung cDNA library. 
The Ig domain begins at a cysteine at about location 16 
and extends to a cysteine at about location 70, and 
5 should be important in protein-protein interactions. The 
EGF domain (SEQ ID NO: 11), which is predicted to contain 
the active part of the protein, begins at a cysteine at 
about amino acid location 104 and extends to a cysteine 
at about amino acid location 140 in this cDNA. 

10 The spacing of the 6 cysteine resides and an 

important glycine residue (amino acid 137) in the EGF 
domain, are conserved between Don-1 and EGF, although 
homology over this region reveals that Don-1 is more 
similar to NDF (47% identity) than EGF (35% identity), 

15 In general, the EGF domain of Don-1 related polypeptides 
requires the following formula: the first C, followed by 
7 amino acids; the second C, followed by 4 or 5 amino 
acids; the third C, followed by 10-13 amino acids; the 
fourth C, followed by 1 amino acid; the fifth C, followed 

20 by 8 amino acids; and then the sixth C. 

The EGF domain of Don-1 (CNETAKSYCVNGGVCYYIEGINQL- 
SCKCPNGFFGQRC, SEQ ID NO: 11) is identical in all five 
splice variants, both murine and human. Thus, probes 
designed based on the nucleotide region encoding this EGF 

25 domain can be used, as described herein, to obtain, in 
humans, mice, and other animals, additional splice 
variant cDNAs of the don- J gene. 

The murine Don-1 polypeptide of Fig. 1 also 
includes a TM domain of approximately 2 3 amino acids 

30 extending from about amino acid location 165 to about 
amino acid location 187. Immediately prior to the TM 
domain are two basic residues (amino acids 163 and 164) 
that should function as a proteolytic cleavage site. 
This would result in the release of soluble ligand from 

35 the cell membrane. The cytoplasmic domain of Don-1 
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extends from about amino acid 183 to about amino acid 
605. 

The Don-1 TM domain ( VLTITGICVALLWGIVCWAYC, SEQ 
ID NO:20), like the EGF domain, is also highly conserved 
5 in the murine and human membrane-bound splice variants of 
Don-1 that include this domain (murine SEQ ID NO: 4 does 
not) . In fact, the TM domain is identical in both human 
splice variants and the membrane-bound form of the murine 
splice variants. As shown in Fig, 5, this Don-1 TM 

10 domain is also highly conserved in other, related 

proteins, such as rat NDF, and human heregulin-0. Thus, 
probes designed based on the nucleotide region encoding 
this TM domain can be used as described herein to obtain, 
in humans, mice, and other animals, additional genes 

15 related to the don-1 gene. 

Fig. 2 shows a second murine cDNA that corresponds 
to another splice variant of murine don-1 (SEQ ID NO: 3), 
which encodes a Don-1 polypeptide of 181 amino acids (SEQ 
ID N0:4). To obtain the nucleotide and amino acids 

20 sequences in Fig, 2, a 1.4kb cDNA that contained an open 
reading frame of 13 9 amino acids was isolated from a 
mouse choroid plexus library. This partial clone 
contained no 5' ATG initiation codon and terminated after 
the EGF domain. This original clone was then used as a 

25 probe to isolate other mouse and human splice variants. 
The other murine splice variant, SEQ ID N0:1 (Fig. 1), 
represents a longer, transmembrane-bound version of the 
original clone. Based on the high homology between the 
two mouse clones over the Ig and EGF domains, the 

30 chimeric clone of Fig. 2 was constructed and designated 
as the murine Don-1 cDNA of SEQ ID NO: 3. This cDNA 
encompasses the nucleotide sequence encoding the first 42 
amino acids of murine Don-1 SEQ ID NO: 2, and the 
remaining 139 amino acids of the original murine Don-1 
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clone. This resulting chimera is 181 amino acids in 
length. 

This splice variant does not contain a TM domain, 
and is thus a secreted protein. The structure of this 
5 second splice variant is identical to the polypeptide of 
SEQ ID NO: 2 from amino acid 1 to amino acid 155. Thus, 
the EGF domain (SEQ ID NO: 11), which is predicted to 
contain the biologically active part of the protein, 
begins at about amino acid location 104 and extends to 

10 amino about acid location 14 0 in this cDNA. 

Fig. 3 shows a cDNA of a human splice variant of 
the don-1 gene (SEQ ID NO: 5), which encodes a polypeptide 
of about 407 amino acids in length (SEQ ID N0:6)- This 
clone was isolated from a human fetal lung cDNA library. 

15 This polypeptide includes an apparent Ig domain extending 
from a cysteine at about location 16 to a cysteine at 
about location 70; an EGF domain extending from a 
cysteine at about location 104 to a cysteine at about 
amino acid location 14 0; a transmembrane domain from 

20 about amino acid 173 to about amino acid 195; and a 
cytoplasmic domain of approximately 211 amino acids 
extending from about amino acid 196 to about amino acid 
407, In addition, this splice variant includes an extra 
8 amino acids in the juxtamembrane region (at locations 

25 157 to 164) compared to the other three splice variants. 

Fig. 4 shows a second human cDNA corresponding to 
another splice variant of human don~l (SEQ ID NO:7), 
which encodes a polypeptide of about 469 amino acids in 
length (SEQ ID N0:8). This second human clone was also 

30 isolated from a human fetal lung cDNA library. This 

polypeptide includes an apparent Ig domain extending from 
a cysteine at about location 54 to a cysteine at about 
location 108; an EGF domain extending from about amino 
acid location 142 to about amino acid location 178; a 

35 transmembrane domain from about amino acid location 203 
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to about amino acid location 225; and a cytoplasmic 
domain of approximately 243 amino acids extending from 
about amino acid 226 to amino acid 469. 

Fig. 7 shows a third human cDNA corresponding to a 
5 third splice variant of the human don-1 gene (SEQ ID 
NO: 31), which encodes a polypeptide of about 647 amino 
acids in length (SEQ ID NO: 32). This third human clone 
was also isolated from a human fetal lung cDNA library. 
This polypeptide includes an apparent Ig domain extending 

10 from a cysteine at about location 54 to a cysteine at 
about location 108; an EGF domain extending from about 
amino acid location 14 2 to about amino acid location 178; 
a transmembrane domain from about amino acid location 203 
to about amino acid location 225; and a cytoplasmic 

15 domain of approximately 421 amino acids extending from 
about amino acid 226 to amino acid 647 (which is the end 
of the polypeptide in view of the termination codon) . 

The invention encompasses, but is not limited to, 
Don-1 proteins and polypeptides that are functionally 

20 related to Don-1 encoded by the nucleotide sequences of 
Fig. 1 (murine SEQ ID N0:1), Fig. 2 (murine SEQ ID NO:3), 
Fig. 3 (human SEQ ID N0:5), Fig. 4 (human SEQ ID N0:7), 
and Fig. 7 (human SEQ ID N0:31). Functionally related 
proteins and polypeptides include any protein or 

25 polypeptide sharing a functional characteristic with Don- 
1, e.g., the ability to affect cell differentiation, 
proliferation, or survival, and those that are active in 
the pl85 assay described herein- 
Such functionally related Don-1 polypeptides 

30 include, but are not limited to, polypeptides with 

additions or substitutions of amino acid residues within 
the amino acid sequence encoded by the don-1 cDNA 
sequences described herein which result in a silent 
change, thus producing a functionally equivalent gene 

35 product. Amino acid substitutions may be made on the 
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basis of similarity in polarity, charge, solubility, 
hydrophobicity, hydrophilicity , and/or the amphipathic 
nature of the residues involved. The function of the new 
polypeptide can then be tested in the pl85 assay 
5 described herein. 

For example, nonpolar (hydrophobic) amino acids 
include alanine, leucine, isoleucine, valine, proline, 
phenylalanine, tryptophan, and methionine; polar neutral 
amino acids include glycine, serine, threonine, cysteine, 

10 tyrosine, asparagine, and glutamine; positively charged 
(basic) amino acids include arginine, lysine, and 
histidine; and negatively charged (acidic) amino acids 
include aspartic acid and glutamic acid. 

While random mutations can be made to don-1 DNA 

15 (using random mutagenesis techniques well known in the 
art) and the resulting mutant Don-1 proteins can be 
tested for activity, site-directed mutations of the don-1 
coding sequence can be engineered (using site-directed 
mutagenesis techniques well known to those skilled in the 

20 art) to generate mutant Don-1 polypeptides with increased 
function, e.g., greater modulation of cell proliferation, 
differentiation or survival, or decreased function, e.g., 
down-modulation of cell proliferation, differentiation, 
or survival. 

25 To design functionally related and/or variant Don- 

1 polypeptides, it is useful to distinguish between 
conserved positions and variable positions. Fig. 5 shows 
an alignment between the amino acid sequences of the 
human and murine Don-1 polypeptides. This alignment can 

30 be used to determine the conserved and variable amino 
acid positions. To preserve Don-1 function, it is 
preferable that conserved residues are not altered. 
Moreover, alteration of non-conserved residues are 
preferably conservative alterations, e.g., a basic amino 

35 acid is replaced by a different basic amino acid. To 
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produce altered function variants, it is preferable to 
make non-conservative changes at variable and/or 
conserved positions. Deletions at conserved and variable 
positions can also be used to create altered function 
5 variants. 

Other mutations to the don-1 coding sequence can 
be made to generate Don-1 polypeptides that are better 
suited for expression, scale up, etc. in a selected host 
cell. For example, N-linked glycosylation sites can be 

10 altered or eliminated to achieve, for example, expression 
of a homogeneous product that is more easily recovered 
and purified from yeast hosts which are known to 
hyperglycosylate N-linked sites. To this end, a variety 
of amino acid substitutions at one or both of the first 

15 or third amino acid positions of any one or more of the 
glycosylation recognition sequences which occur (in N-X-S 
or N-X-T) , and/or an amino acid deletion at the second 
position of any one or more of such recognition 
sequences, will prevent glycosylation at the modified 

20 tripeptide sequence, ( See , e.g,, Miyajima et al., EMBO 
J., 5:1193, 1986). 

Preferred Don-1 polypeptides are those 
polypeptides, or variants thereof, that activate 
receptor-type tyrosine kinases which have a molecular 

25 weight of 185 kDa, which includes pl85 (erbB2) . 

Activating Don-1 polypeptides can be determined by a 
standard pl85 assay as described herein. Briefly, the 
activity of the EGF domain of Don-1 was ascertained by 
testing the ability of an EGF domain-containing fusion 

30 polypeptide to phosphorylate a 185 kDa protein in the 

breast adenocarcinoma cell line MDA-MB453. Serum-starved 
cells were treated with EGF, NDF, conditioned media from 
raock-transf ected or Don-1 EFG-transf ected 293Ebna cells 
as described below. Analysis of phosphorylated proteins 

35 by Western blotting revealed that Don-1 EGF induced 
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phosphorylation of pl85 at a level comparable to 
saturating amounts of NDF, which represented an 
approximate ten-fold increase in phosphorylation over 
uninduced cells. This result demonstrates that the EGF 
5 domain of Don-1 binds and activates a known member of the 
EGFR family, pl85. 

Preferred Don-l polypeptides and variants have 
20%, 50%, 75%, 90%, or even 100% or more of the activity 
of the human form of Don-1 (SEQ ID N0s:6, 8, and 32) 

10 described herein. Such comparisons are generally based 
on equal concentrations of the molecules being compared. 
The comparison can also be based on the amount of protein 
or polypeptide required to reach 50% of the maximal 
activation obtainable. 

15 In addition to the don-1 cDNA sequences described 

above, additional splice variants of the don-1 gene, and 
related family members of the don-1 gene present in the 
mouse, humans, or other species can be identified and 
readily isolated without undue experimentation by well 

20 known molecular biological techniques given the specific 
sequences described herein. Further, genes may exist at 
other genetic loci within the genome that encode proteins 
which have extensive homology to Don-1 polypeptides or 
one or more domains of Don-1 polypeptides. These genes 

25 can be identified via similar techniques. 

For example, to obtain additional splice variants 
of the don-1 gene, an oligonucleotide probe based on the 
cDNA sequences described herein, or fragments thereof, 
e.g., the nucleotide region encoding the EGF domain can 

30 be labeled and used to screen a cDNA library constructed 
from mRNA obtained from an organism of interest. To 
obtain additional neuregulin-related genes related to the 
don-1 gene, an oligonucleotide probe based on the 
nucleotide region encoding the TM domain of Don-1, can be 

35 used to screen a suitable cDNA library. 
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The preferred method of labeling is to use ^^P- 
labeled ATP with polynucleotide kinase, as is well known 
in the art, to radiolabel the oligonucleotide probe. 
However other methods may be used to label the 
5 oligonucleotide, including, but not limited to, 
biotinylation or enzyme labeling. 

Hybridization is performed under stringent 
conditions. Alternatively, a labeled fragment can be 
used to screen a genomic library derived from the 

10 organism of interest, again, using appropriately 

stringent conditions. Such stringent conditions are well 
known, and will vary predictably depending on the 
specific organisms from which the library and the labeled 
sequences are derived. 

15 Nucleic acid duplex or hybrid stability is 

expressed as the melting temperature or T^, which is the 
temperature at which a probe dissociates from a target 
DNA. This melting temperature is used to define the 
required stringency conditions. If sequences are to be 

20 identified that are related and substantially identical 
to the probe, rather than identical, then it is useful to 
first establish the lowest temperature at which only 
homologous hybridization occurs with a particular BSC or 
SSPE concentration. Then assume that 1% mismatching 

2 5 results in 1**C decrease in the T^^ and reduce the 

temperature of the final wash accordingly (for example, 
if sequences with > 95% identity with the probe are 
sought, decrease the final wash temperature by 5*»C) . 
Note that this assumption is very approximate, and the 

30 actual change in T^ can be between 0,5° and 1.5°C per 1% 
mismatch. 

As used herein, high stringency conditions include 
hybridizing at 68 ^'C in 5x SSC/5x Denhardt solution/ 1 • 0% 
SDS, or in 0.5 M NaHPO^ (pH 7.2)/l mM EDTA/7% SDS, or in 
35 50% formamide/0. 25 M NaHP04 (pH 7.2) /O. 25 M NaCl/1 mM 
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EDTA/7% SDS; and washing in 0.2x SSC/0.1% SDS at room 
temperature or at 42 or in O.lx SSC/0.1% SDS at es^'C, 
or in 40 mM NaHP04 (pH 7.2) /I mM EDTA/5% SDS at 50°C, or 
in 40 mM NaHPO^ (pH 7.2) 1 mM EDTA/1% SDS at 50*0. 
5 Moderately stringent conditions include washing in 3x SSC 
at 42**C. The parameters of salt concentration and 
temperature can be varied to achieve the desired level of 
identity between the probe and the target nucleic acid. 

For guidance regarding such conditions see, for 
10 example, Sambrook at al., 1989, Molecular Cloning, A 
Laboratory Manual, Cold Springs Harbor Press, N.Y.; and 
Ausubel et al. (eds.), 1995, Current Protocols in 
Molecular Biology, (John Wiley & Sons, N.Y.) at Unit 
2 . 10. 

15 In one approach, appropriate human cDNA libraries 

can be screened. Such cDNA libraries can, for example, 
include human breast, human prostate, or fetal human 
brain or lung cDNA libraries. For example, panels of 
human breast cells can be screened for don-i expression 

20 by, for example. Northern blot analysis. Upon detection 
of don-i transcript, cDNA libraries can be constructed 
from RNA isolated from the appropriate cell line, 
utilizing standard technigues well known to those of 
skill in the art. The human cDNA library can then be 

25 screened with a don~l probe to isolate a human don-1 
cDNA- As described below, this method was used to 
determine the human don-i cDNAs in Figs. 2, 4, and 7. 

Alternatively, a human total genomic DNA library 
can be screened using don-1 probes. Don-I-positive 

30 clones can then be sequenced and, further, the 

intron/exon structure of the human don-i gene can be 
elucidated. Once genomic sequence is obtained, 
oligonucleotide primers can be designed based on the 
sequence for use in the isolation, via, for example 

35 Reverse Transcriptase-coupled PCR, of human don-I cDNA. 
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Further, a previously unknown gene sequence can be 
isolated by performing PGR using two degenerate 
oligonucleotide primer pools designed on the basis of 
nucleotide sequences within the don-i cDNAs defined 
5 herein. The template for the reaction can be cDNA 

obtained by reverse transcription of mRNA prepared from 
human or non-human cell lines or tissue known or 
suspected to express a don-1 gene allele. The PGR 
product can be subcloned and sequenced to insure that the 
10 amplified sequences represent the sequences of a don-1 or 
don-I-like gene nucleic acid sequence. 

The PGR fragment can then be used to isolate a 
full length cDNA clone by a variety of methods. For 
example, the amplified fragment can be labeled and used 
15 to screen a bacteriophage cDNA library. Alternatively, 
the labeled fragment can be used to screen a genomic 
library. 

PGR technology also can be used to isolate full 
length cDNA sequences. For example, RNA can be isolated, 

20 following standard procedures, from an appropriate 
cellular or tissue source. A reverse transcription 
reaction can be performed on the RNA using an 
oligonucleotide primer specific for the most 5' end of 
the amplified fragment for the priming of first strand 

25 synthesis. The resulting RNA/DNA hybrid can then be 
"tailed" with guanines using a standard terminal 
transferase reaction, the hybrid can be digested with 
RNAase H, and second strand synthesis can then be primed 
with a poly-C primer. Thus, cDNA sequences upstream of 

30 the amplified fragment can easily be isolated. For a 
review of useful cloning strategies, see e.g., Sambrook 
et al., supra ; and Ausubel et al., supra . 

In cases where the gene identified is the normal 
(wild type) gene, this gene can be used to isolate mutant 

35 alleles of the gene. Such an isolation is preferable in 
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processes and disorders which are known or suspected to 
have a genetic basis. Mutant alleles can be isolated 
from individuals either known or suspected to have a 
genotype which contributes to tumor, e.g., 
5 adenocarcinoma, proliferation or progression. Mutant 
alleles and mutant allele gene products can then be 
utilized in the therapeutic and diagnostic assay systems 
described below. 

A cDNA of a mutant gene can be isolated, for 

10 example, by using PGR, a technique which is well-known to 
one skilled in the art. In this case, the first cDNA 
strand can be synthesized by hybridizing a oligo-dT 
oligonucleotide to mRNA isolated from tissue known or 
suspected of being expressed in an individual putatively 

15 carrying the mutant allele, and by extending the new 

strand with reverse transcriptase. The second strand of 
the cDNA can then be synthesized using an oligonucleotide 
that hybridizes specifically to the 5'- end of the normal 
gene. Using these two primers, the product is then 

20 amplified via PGR, cloned into a suitable vector, and 

subjected to DNA sequence analysis by methods well known 
in the art. By comparing the DNA sequence of the mutant 
gene to that of the normal gene, the mutation (s) 
responsible for the loss or alteration of function of the 

25 mutant gene product can be ascertained. 

Alternatively, a genomic or cDNA library can be 
constructed and screened using DNA or RNA, respectively, 
from a tissue known to or suspected of expressing the 
gene of interest in an individual suspected of or known 

30 to carry the mutant allele. The normal gene or any 

suitable fragment thereof can then be labeled and used as 
a probe to identify the corresponding mutant allele in 
the library. The clone containing this gene can then be 
purified through methods routinely practiced in the art, 
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and subjected to sequence analysis using standard 
techniques as described herein. 

Additionally, an expression library can be 
constructed using DNA isolated from or cDNA synthesized 
5 from a tissue known to or suspected of expressing the 
gene of interest in an individual suspected of or known 
to carry the mutant allele. In this manner, gene 
products made by the putatively mutant tissue can be 
expressed and screened using standard antibody screening 

10 techniques in conjunction with antibodies raised against 
the normal gene product, as described herein. For 
screening techniques, see, for example, Harlow, E. and 
Lane, eds., 1988, "Antibodies: A Laboratory Manual," 
Cold Spring Harbor Press, Cold Spring Harbor. 

15 In cases where the mutation results in an 

expressed gene product with altered function (e.g., as a 
result of a missense mutation) , a polyclonal set of 
antibodies is likely to cross-react with the mutant gene 
product. Library clones detected via their reaction with 

20 such labeled antibodies can be purified and subjected to 
sequence analysis as described herein. 

Polypeptides corresponding to one or more domains 
of full-length Don-1 protein, e.g., the Ig, TM, and EGF 
domains, are also within the scope of the invention. 

25 Preferred polypeptides are those which are soluble under 
normal physiological conditions. Also within the 
invention are fusion proteins in which a portion (e.g., 
one or more domains) of Don-1 is fused to an unrelated 
protein or polypeptide (i.e., a fusion partner) to create 

30 a fusion protein. The fusion partner can be a moiety 
selected to facilitate purification, detection, or 
solubilization, or to provide some other function. 
Fusion proteins are generally produced by expressing a 
hybrid gene in which a nucleotide sequence encoding all 

35 or a portion of Don-l is joined in-frame to a nucleotide 
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sequence encoding the fusion partner. Fusion partners 
include, but are not limited to, the constant region of 
an immunoglobulin (IgFc) . A fusion protein in which a 
Don-1 polypeptide is fused to IgFc can be more stable and 
5 have a longer half-life in the body than the Don-l 
polypeptide on its own. 

Also within the scope of the invention are various 
soluble forms of Don-1. For example, the entire 
extracellular domain of Don-1 or a portion or domain 

10 thereof can be expressed on its own or fused to a 
solubilization partner, e.g., an immunoglobulin. 

The invention also features Don-1 polypeptides 
which can inhibit proliferation of adenocarcinoma cells. 
The ability of the Don-1 polypeptides to inhibit 

15 proliferation of carcinoma cells can be determined using 
a standard proliferation assay, as follows. Cell, e.g., 
adenocarcinoma cell, proliferation and viability can be 
measured by the cleavage of MTT as described by the 
manufacturer (Boehringer Mannheim, Catalog No. 14 65007) . 

20 Briefly, cells (2 x 10*^) are seeded in separate 100 
volumes into 96 well tissue culture plates with media 
containing various concentrations of a Don-1 polypeptide. 
The plates are then incubated for various times ( 1 to 3 
days) in a humidified atmosphere of 5% COj at 37°C. 0.5 

25 mg/ml MTT labeling reagent is added to each well, and the 
plates are incubated for an additional four hours at 
37*C. 100 of solubilization buffer is then added to 
each well and the plates are allowed to stand for 12 
hours at 37*>c. The spectrophotometrical absorbance at 

30 550 and 690 nm is then measured as a gauge of cell 
proliferation and viability. 

In general, Don-1 proteins according to the 
invention can be produced by transformation 
(transf ection, transduction, or infection) of a host cell 

35 with all or part of a Don-l-encoding DNA fragment (e.g.. 
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one of the cDNAs described herein) in a suitable 
expression vehicle. Suitable expression vehicles 
include: plasmids, viral particles, and phage* For 
insect cells, baculovirus expression vectors are 
5 suitable. The entire expression vehicle, or a part 

thereof, can be integrated into the host cell genome. In 
some circumstances, it is desirable to employ an 
inducible expression vector, e.g, , the LACSWITCH* 
Inducible Expression System (Stratagene; LaJolla, CA) . 

10 Those skilled in the field of molecular biology 

will understand that any of a wide variety of expression 
systems can be used to provide the recombinant protein. 
The precise host cell used is not critical to the 
invention. The Don-1 protein can be produced in a 

15 prokaryotic host (e.g., E, coli or B. subtilxs) or in a 
eukaryotic host (e.g., Saccharomyaes or Pichia; mammalian 
cells, e.g,, COS, NIH 3T3 CHO, BHK, 293, or HeLa cells; 
or insect cells) , 

Proteins and polypeptides can also be produced in 

20 plant cells* For plant cells viral expression vectors 
(e.g., cauliflower mosaic virus and tobacco mosaic virus) 
and plasmid expression vectors (e.g., Ti plasmid) are 
suitable. Such cells are available from a wide range of 
sources (e.g., the American Type Culture Collection, 

25 Rockland, MD; also, see, e.g., Ausubel et al.. Current 
Protocols in Molecular Biology, John Wiley & Sons, New 
York, 1994). The methods of transformation or 
transfection and the choice of expression vehicle will 
depend on the host system selected. Transformation and 

30 transfection methods are described, e.g., in Ausubel et 
al., su pra ; expression vehicles may be chosen from those 
provided, e.g., in Cloning Vectors: A Laboratory Manual 
(P.H. Pouwels et al., 1985, Supp. 1987). 

The host cells harboring the expression vehicle 

35 can be cultured in conventional nutrient media adapted as 
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need for activation of a chosen gene, repression of a 
chosen gene, selection of transf ormants , or amplification 
of a chosen gene. 

One preferred expression system is the mouse 3T3 
5 fibroblast host cell transfected with a pMAMneo 

expression vector (Clontech, Palo Alto, CA) • pMAMneo 
provides an RSV-LTR enhancer linked to a dexamethasone- 
inducible MMTV-LTR promotor, an SV40 origin of 
replication which allows replication in mammalian 

10 systems, a selectable neomycin gene, and SV40 splicing 
and polyadenylation sites. DNA encoding a Don-1 protein 
would be inserted into the pMAMneo vector in an 
orientation designed to allow expression. The 
recombinant Don-1 protein would be isolated as described 

15 below. Other preferable host cells that can be used in 
conjunction with the pMAMneo expression vehicle include 
COS cells and CHO cells (ATCC Accession Nos . CRL 1650 and 
CCL 61, respectively) • 

Don-1 polypeptides can be produced as fusion 

20 proteins. For example, the expression vector pUR278 
(Ruther et al. , EMBO J. 2:1791, 1983), can be used to 
create lacZ fusion proteins. The pGEX vectors can be 
used to express foreign polypeptides as fusion proteins 
with glutathione S-transf erase (GST) . In general, such 

25 fusion proteins are soluble and can be easily purified 
from lysed cells by adsorption to glutathione-agarose 
beads followed by elution in the presence of free 
glutathione. The pGEX vectors are designed to include 
thrombin or factor Xa protease cleavage sites so that the 

30 cloned target gene product can be released from the GST 
moiety. 

In an insect cell expression system, Autoarapha 
calif ornica nuclear polyhidrosis virus (AcNPV) , which 
grows in Spodoptera f rugiperda cells, is used as a vector 
35 to express foreign genes. A Don-l coding sequence can be 
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cloned individually into non-essential regions (for 
example the polyhedrin gene) of the virus and placed 
under control of an AcNPV promoter, e.g., the polyhedrin 
promoter. Successful insertion of a gene encoding a Don- 
5 1 polypeptide or protein will result in inactivation of 
the polyhedrin gene and production of non-occluded 
recombinant virus (i.e., virus lacking the proteinaceous 
coat encoded by the polyhedrin gene) • These recombinant 
viruses are then used to infect spodoptera frugiperda 

10 cells in which the inserted gene is expressed ( see , e.g., 
Smith et al., J. Virol. 46:584, 1983; Smith, U.S. Patent 
No. 4,215,051). 

In mammalian host cells, a number of viral-based 
expression systems can be utilized. When an adenovirus 

15 is used as an expression vector, the Don-1 nucleic acid 
seguence can be ligated to an adenovirus transcription/ 
translation control complex, e.g., the late promoter and 
tripartite leader sequence. This chimeric gene can then 
be inserted into the adenovirus genome by in vitro or in 

20 vivo recombination- Insertion into a non-essential 

region of the viral genome (e.g., region El or E3) will 
result in a recombinant virus that is viable and capable 
of expressing a Don-1 gene product in infected hosts 
( see , e.g., Logan, Proc. Natl. Acad. Sci, USA 81:3655, 

25 1984) . 

Specific initiation signals may be required for 
efficient translation of inserted nucleic acid sequences. 
These signals include the ATG initiation codon and 
adjacent sequences. In cases where an entire native Don- 

30 1 gene or cDNA, including its own initiation codon and 
adjacent sequences, is inserted into the appropriate 
expression vector, no additional translational control 
signals may be needed. In other cases, exogenous 
translational control signals, including, perhaps, the 

35 ATG initiation codon, must be provided. Furthermore, the 



wo 98/07736 



PCT/US97/14585 



- 34 - 

initiation codon must be in phase with the reading frame 
of the desired coding seguence to ensure translation of 
the entire insert. These exogenous translational control 
signals and initiation codons can be of a variety of 
5 origins, both natural and synthetic. The efficiency of 
expression may be enhanced by the inclusion of 
appropriate transcription enhancer elements, 
transcription terminators (Bittner et al., Methods in 
Enzymol. 153:516, 1987). 
10 In general, the signal sequence can be a component 

of the expression vector, or it may be a part of don-1 
DNA that is inserted into the vector. The native don-i 
DNA is thought to encode a signal sequence at the amino 
terminus of the polypeptide that is cleaved during post- 
is translational processing to form the mature Don-1 

polypeptide that binds to the pl85 receptor. However, a 
conventional signal structure is not apparent. Native 
Don-1 is secreted from cells, but may remain lodged in 
the membrane because it contains a transmembrane domain 
20 and a cytoplasmic region in the carboxyl terminal region 
of the polypeptide. Thus, in a secreted, soluble version 
of Don-1, the carboxyl terminal domain of the molecule, 
including the transmembrane domain, is ordinarily 
deleted. This truncated form of the Don-1 polypeptide 
25 may be secreted from the cell, provided that the DNA 

encoding the truncated variant encodes a signal sequence 
recognized by the host. 

Don-1 polypeptides can be expressed directly or as 
a fusion with a heterologous polypeptide, such as a 
30 signal sequence or other polypeptide having a specific 
cleavage site at the N-and/or C-terminus of the mature 
protein or polypeptide. Included within the scope of 
this invention are Don-1 polypeptides with the native 
signal sequence deleted and replaced with a heterologous 
35 signal sequence. The heterologous signal sequence 
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selected should be one that is recognized and processed, 
i.e., cleaved by a signal peptidase, by the host cell. 
For prokaryotic host cells that do not recognize and 
process the native Don-1 signal sequence, the signal 
sequence is substituted by a prokaryotic signal sequence 
selected, for example, front the group of the alkaline 
phosphatase, penicillinase, Ipp, or heat-stable 
enterotoxin II leaders. For yeast secretion the native 
Don-1 signal sequence may be substituted by the yeast 
invertase, alpha factor, or acid phosphatase leaders. In 
mammalian cell expression the native signal sequence is 
satisfactory, although other mammalian signal sequences 
may be suitable. 

A host cell may be chosen which modulates the 
expression of the inserted sequences, or modifies and 
processes the gene product in a specific, desired 
fashion. Such modifications (e.g., glycosylation) and 
processing (e.g., cleavage) of protein products may be 
important for the function of the protein. Different 
host cells have characteristic and specific mechanisms 
for the post-translational processing and modification of 
proteins and gene products. Appropriate cell lines or 
host systems can be chosen to ensure the correct 
modification and processing of the foreign protein 
expressed. To this end, eukaryotic host cells that 
possess the cellular machinery for proper processing of 
the primary transcript, glycosylation, and 
phosphorylation of the gene product can be used. Such 
mammalian host cells include, but are not limited to, 
CHO, VERO, BHK, HeLa, COS, MDCK, 293, 3T3 , WI38, and in 
particular, choroid plexus cell lines. 

Alternatively, a Don-1 protein can be produced by 
a stably-transf ected mammalian cell line. A number of 
vectors suitable for stable transfection of mammalian 
cells are available to the public, see, e.g., Pouwels et 



wo 98/07736 



PCT/US97/14585 



- 36 - 

al. ( supra ) ; methods for constructing such cell lines are 
also publicly available, e.g., in Ausubel et al. ( supra ) . 
In one example, cDNA encoding the Don-1 protein is cloned 
into an expression vector that includes the dihydrof olate 
5 reductase (DHFR) gene. Integration of the plasmid and, 
therefore, the Don-1 protein-encoding gene into the host 
cell chromosome is selected for by including 0.01-300 
methotrexate in the cell culture medium (as described in 
Ausubel et al*, supra ] . This dominant selection can be 

10 accomplished in most cell types. 

Recombinant protein expression can be increased by 
DHFR-mediated amplification of the transfected gene. 
Methods for selecting cell lines bearing gene 
amplifications are described in Ausubel et al. ( supra ) ; 

15 such methods generally involve extended culture in medium 
containing gradually increasing levels of methotrexate. 
DHFR-containing expression vectors commonly used for this 
purpose include pCVSEII-DHFR and pAdD26SV(A) (described 
in Ausubel et al., supra ) . Any of the host cells 

20 described above or, preferably, a DHFR-def icient CHO cell 
line (e-g., CHO DHFR"cells, ATCC Accession No. CRL 9096) 
are among the host cells preferred for DHFR selection of 
a stably-transfected cell line or DHFR-mediated gene 
amplification. 

25 A number of other selection systems can be used, 

including but not limited to the herpes simplex virus 
thymidine kinase, hypoxanthine-guanine phosphor ibosyl- 
transf erase, and adenine phosphor ibosyltransf erase genes 
can be employed in tJc, hgprt , or aprt cells, 

30 respectively. In addition, gpt, which confers resistance 
to mycophenolic acid (Mulligan et al., Proc . Natl. Acad. 
Sci. USA, 78:2072, 1981); neo, which confers resistance 
to the aminoglycoside G-418 (Colberre-Garapin et al., J. 
Mol . Biol., 150:1, 1981); and hygro , which confers 
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resistance to hygromycin (Santerre et al- , Gene, 30:147, 
1981) , can be used. 

Alternatively, any fusion protein can be readily 
purified by utilizing an antibody specific for the fusion 
5 protein being expressed. For example, a system described 
in Janknecht et al., Proc. Natl, Acad. Sci. USA, 88:8972 
(1981), allows for the ready purification of non- 
denatured fusion proteins expressed in human cell lines. 
In this system, the gene of interest is subcloned into a 

10 vaccinia recombination plasmid such that the gene's open 
reading frame is translationally fused to an amino- 
terminal tag consisting of six histidine residues. 
Extracts from cells infected with recombinant vaccinia 
virus are loaded onto Ni^^ nitriloacetic acid-agarose 

15 columns, and histidine-tagged proteins are selectively 
eluted with imidazole-containing buffers. 

Alternatively, Don-1 or a portion thereof, can be 
fused to an immunoglobulin Fc domain. Such a fusion 
protein can be readily purified using a protein A column- 

20 Moreover, such fusion proteins permit the production of a 
dimeric form of a Don-1 polypeptide having increased 
stability in vivo. 

Don-1 proteins and polypeptides can also be 
expressed in transgenic animals. Animals of any species, 

25 including, but not limited to, mice, rats, rabbits, 
guinea pigs, pigs, micro-pigs, goats, and non-human 
primates, e.g., baboons, monkeys, and chimpanzees, can be 
used to generate Don-l-expressing transgenic animals. 
Various known techniques can be used to introduce a don-I 

30 transgene into animals to produce the founder lines of 
transgenic animals. Such techniques include, but are not 
limited to, pronuclear microinjection (U.S. Pat. No. 
4,873,191); retrovirus mediated gene transfer into germ 
lines (Van der Putten et al., Proc. Natl. Acad. Sci . ^ 

35 USA, 82:6148, 1985); gene targeting into embryonic stem 
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cells (Thompson et al,, Cell, 56:313, 1989); and 
electroporation of embryos (Lo, Mol . Cell. Biol., 3:1803, 
1983) . 

The present invention provides for transgenic 
5 animals that carry the don-i transgene in all their 
cells, as well as animals that carry the transgene in 
some, but not all of their cells, i.e., mosaic animals. 
The transgene can be integrated as a single transgene or 
in concatamers, e.g., head-to-head tandems or head-to- 

10 tail tandems. The transgene can also be selectively 
introduced into and activated in a particular cell type 
(Lasko et al., Proc . Natl. Acad. Sci . USA, 89:6232, 
1992) . The regulatory sequences required for such a 
cell-type specific activation will depend upon the 

15 particular cell type of interest, and will be apparent to 
those of skill in the art. 

When it is desired that the don-1 transgene be 
integrated into the chromosomal site of the endogenous 
don-1 gene, gene targeting is preferred. Briefly, when 

20 such a technique is to be used, vectors containing some 
nucleotide sequences homologous to an endogenous don-I 
gene are designed for the purpose of integrating, via 
homologous recombination with chromosomal sequences, into 
and disrupting the function of the nucleotide sequence of 

25 the endogenous gene. The transgene also can be 

selectively introduced into a particular cell type, thus 
inactivating the endogenous don-l gene in only that cell 
type (Gu et al., Science, 265:103, 1984). The regulatory 
sequences required for such a cell-type specific 

30 inactivation will depend upon the particular cell type of 
interest, and will be apparent to those of skill in the 
art. 

Once transgenic animals have been generated, the 
expression of the recombinant don-l gene can be assayed 
35 utilizing standard techniques. Initial screening may be 
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accomplished by Southern blot analysis or PGR techniques 
to analyze animal tissues to assay whether integration of 
the transgene has taken place. The level of mRNA 
expression of the transgene in the tissues of the 
5 transgenic animals may also be assessed using techniques 
which include, but are not limited to, Northern blot 
analysis of tissue samples obtained from the animal, in 
situ hybridization analysis, and RT-PCR. Samples of don- 
1 gene-expressing tissue, also can be evaluated 

10 immunocytochemically using antibodies specific for the 
Don-1 transgene product. 

Once the recombinant Don-1 protein is expressed, 
it is isolated. Secreted forms can be isolated from the 
culture media, while non-secreted forms must be isolated 

15 from the host cells. Proteins can be isolated by 

affinity chromatography. In one example, an anti-Don-1 
protein antibody (e.g., produced as described herein) is 
attached to a column and used to isolate the Don-1 
protein. Lysis and fractionation of Don-1 protein- 

20 harboring cells prior to affinity chromatography can be 
performed by standard methods (see, e.g., Ausubel et al., 
supra ) . Alternatively, a Don-1 fusion protein, for 
example, a Don-l-maltose binding protein, a Don-1-^- 
galactosidase, or a Don-l-trpE fusion protein, can be 

25 constructed and used for Don-1 protein isolation ( see , 
e.g., Ausubel et al. , supra ; New England Biolabs, 
Beverly, MA) . 

Once isolated, the recombinant protein can, if 
desired, be further purified, e.g. , by high performance 

30 liquid chromatography using standard techniques ( see , 
e.g. , Fisher, Laboratory Techniques In Biochemistry And 
Molecular Biology, eds., Work and Burdon, Elsevier, 
1980) . 

Given the amino acid sequences described herein, 
35 polypeptides of the invention, particularly short Don-1 
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polypeptides, can be produced by standard chemical 
synthesis (e.g., by the methods described in Solid Phase 
Peptide Synthesis, 2nd ed. , The Pierce Chemical Co., 
Rockford, IL, 1984) . 
5 These general techniques of polypeptide expression 

and purification can also be used to produce and isolate 
useful Don-1 polypeptide analogs (described herein) . 

The invention also features proteins which 
interact with Don-1 and are involved in the function of 

10 Don-1. Also included in the invention are the genes 
encoding these interacting proteins. Interacting 
proteins can be identified using methods known to those 
skilled in the art. One suitable method is the "two- 
hybrid system," which detects protein interactions in 

15 vivo (Chien et al., Proc. Natl. Acad. Sci . USA, 88:9578, 
1991) . A kit for practicing this method is available 
from Clontech (Palo Alto, CA) . 

Anti-Don-1 Antibodies 

Human Don-1 proteins and polypeptides (or 

20 immunogenic fragments or analogs) can be used to raise 
antibodies useful in the invention, and such polypeptides 
can be produced by recombinant or peptide synthetic 
techniques (see, e.g. , Solid Phase Peptide Synthesis , 
supra ; Ausubel et al., supra ) . In general, the peptides 

25 can be coupled to a carrier protein, such as KLH, as 
described in Ausubel et al., supra . mixed with an 
adjuvant, and injected into a host mammal. Antibodies 
can be purified by peptide antigen affinity 
chromatography . 

30 In particular, various host animals can be 

immunized by injection with a Don-1 protein or 
polypeptide. Host animals include rabbits, mice, guinea 
pigs, and rats. Various adjuvants can be used to 
increase the immunological response, depending on the 
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host species, including but not limited to Freund's 
(complete and incomplete) , mineral gels such as aluminum 
hydroxide, surface active substances such as 
lysolecithin, pluronic polyols, polyanions, peptides, oil 
5 emulsions, keyhole limpet hemocyanin, dinitrophenol, and 
potentially useful human adjuvants such as BCG (bacille 
Calmette-Guerin) and CorynBbacterium parvum. Polyclonal 
antibodies are heterogeneous populations of antibody 
molecules derived from the sera of the immunized animals. 

10 Antibodies within the invention include monoclonal 

antibodies, polyclonal antibodies, humanized or chimeric 
antibodies, single chain antibodies, Fab fragments, 
F(ab')2 fragments, and molecules produced using a Fab 
expression library. 

15 Monoclonal antibodies, which are homogeneous 

populations of antibodies to a particular antigen, can be 
prepared using the Don-1 proteins described above and 
standard hybridoma technology ( see , e.g., Kohler et al,, 
Nature, 256:495, 1975; Kohler et al., Eur. J. Immunol., 

20 6:511, 1976; Kohler et al*, Eur. J. Immunol., 6:292, 

1976; Hammerling et al., In Monoclonal Antibodies and T 
Cell Hybridomas , Elsevier, NY, 1981; Ausubel et al., 
supra ) - 

In particular, monoclonal antibodies can be 
25 obtained by any technique that provides for the 

production of antibody molecules by continuous cell lines 
in culture such as described in Kohler et al. , Nature, 
256:495, 1975, and U.S. Patent No. 4,376,110; the human 
B-cell hybridoma technique (Kosbor et al.. Immunology 
30 Today, Ail 2, 198 3; Cole et al., Proc. Natl. Acad. Sci . 
USA, 80:2026, 1983), and the EBV-hybr idoma technique 
(Cole et al., Monoclonal Antibodies and Cancer Therapy . 
Alan R. Liss, Inc., pp. 77-96, 1983). Such antibodies 
can be of any immunoglobulin class including IgG, IgM, 
35 IgE, IgA, IgD, and any subclass thereof. The hybridoma 
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producing the mAb of this invention can be cultivated in 
vitro or in vivo. The ability to produce high titers of 
mAbs 273 vivo makes this the presently preferred method of 
production. 

5 Once produced, polyclonal or monoclonal antibodies 

are tested for specific Don-1 recognition by Western blot 
or immunoprecipitation analysis by standard methods, 
e.g*, as described in Ausubel et al., supra , Antibodies 
that specifically recognize and bind to Don-1 are useful 

10 in the invention. For example, such antibodies can be 
used in an immunoassay to monitor the level of Don-1 
produced by a mammal (for example, to determine the 
amount or subcellular location of Don-1) . 

Preferably, antibodies of the invention are 

15 produced using fragments of the Don-1 protein which lie 
outside highly conserved regions and appear likely to be 
antigenic, by criteria such as high frequency of charged 
residues* In one specific example, such fragments are 
generated by standard techniques of PGR, and are then 

20 cloned into the pGEX expression vector (Ausubel et al., 
supra ) . Fusion proteins are expressed in E, coli and 
purified using a glutathione agarose affinity matrix as 
described in Ausubel, et al., supra , 

Antibodies can also be prepared to bind 

25 specifically to one or more particular domains of Don-1, 
such as the EGF domain (SEQ ID NO:ll), by immunizing an 
animal with a polypeptide corresponding to only the 
desired domain or domains. 

In some cases it may be desirable to minimize the 

30 potential problems of low affinity or specificity of 
antisera. In such circumstances, two or three fusions 
can be generated for each protein, and each fusion can be 
injected into at least two rabbits, Antisera can be 
raised by injections in a series, preferably including at 

35 least three booster injections. 
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Antisera is also checked for its ability to 
immunoprecipitate recombinant Don-1 proteins or control 
proteins, such as glucocorticoid receptor, CAT, or 
lucif erase. 

5 The antibodies can be used, for example, in the 

detection of the Don-1 in a biological sample as part of 
a diagnostic assay. Antibodies also can be used in a 
screening assay to measure the effect of a candidate 
compound on expression or localization of Don-l- 

10 Additionally, such antibodies can be used in conjunction 
with the gene therapy techniques described to, for 
example, evaluate the normal and/or engineered Don-1- 
expressing cells prior to their introduction into the 
patient. Such antibodies additionally can be used in a 

15 method for inhibiting abnormal Don-1 activity. 

Techniques developed for the production of 
"chimeric antibodies" (Morrison et al., Proc. Natl, Acad. 
Sci., 81:6851, 1984; Neuberger et al., Nature, 3X2:604, 
1984; Takeda et al., Nature, 314:452, 1984) can be used 

20 to splice the genes from a mouse antibody molecule of 
appropriate antigen specificity together with genes from 
a human antibody molecule of appropriate biological 
activity. A chimeric antibody is a molecule in which 
different portions are derived from different animal 

25 species, such as those having a variable region derived 
from a murine mAb and a human immunoglobulin constant 
region. 

Alternatively, techniques described for the 
production of single chain antibodies (U.S. Patent 

30 4,946,778; and U.S. Patents 4,946,778 and 4,704,692) can 
be adapted to produce single chain antibodies against a 
Don-1 protein or polypeptide. Single chain antibodies 
are formed by linking the heavy and light chain fragments 
of the Fv region via an amino acid bridge, resulting in a 

35 single chain polypeptide. 
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Antibody fragments that recognize and bind to 
specific epitopes can be generated by known techniques. 
For example^ such fragments can include but are not 
limited to F(ab')2 fragments, which can be produced by 
5 pepsin digestion of the antibody molecule, and Fab 
fragments, which can be generated by reducing the 
disulfide bridges of F(ab')2 fragments. Alternatively, 
Fab expression libraries can be constructed (Huse et al., 
Science, 246:1275, 1989) to allow rapid and easy 

10 identification of monoclonal Fab fragments with the 
desired specificity. 

Antibodies to Don-1 can, in turn, be used to 
generate anti-idiotype antibodies that resemble a portion 
of Don-1, using techniques well known to those skilled in 

15 the art (see, e.g., Greenspan et al., FASEB J., 7:437, 
1993; Nissinoff, J. Immunol., 147:2429, 1991). For 
example, antibodies that bind to Don-1 and competitively 
inhibit the binding of a ligand of Don-1 can be used to 
generate anti-idiotypes that resemble a ligand binding 

20 domain of Don-1 and, therefore, bind and neutralize a 
ligand of Don-1. Such neutralizing anti-idiotypic 
antibodies or Fab fragments of such anti-idiotypic 
antibodies can be used in therapeutic regimens. 

In addition, antibodies can be expressed within an 

25 intracellular compartment of a cell, such as the 

endoplasmic reticulum, to specifically bind to a target 
protein or polypeptide within the cell. Such specific 
binding can be used to alter, e.g., inhibit, the function 
of the target protein. Intracellular expression of 

30 antibodies is achieved by introducing into the cells 

nucleic acids that encode the antibodies, e.g., by using 
a recombinant viral vector or other vector system 
suitable for delivering a gene to a cell in vivo. 

Preferably the antibody is a single chain Fv 

35 fragment, although whole antibodies, or antigen binding 
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fragments thereof, e.g., Fab fragments, can be used. 
Targeting of an antibody to an intracellular compartment 
can be accomplished by incorporating an appropriate 
signal sequence into the antibody. For example, a 
5 nucleic acid can be designed to include a first 

nucleotide sequence encoding a signal sequence (e.g., to 
an endoplasmic reticulum), operatively linked in a 5' to 
3' direction by a phosphodiester bond to a second 
nucleotide sequence encoding a single chain Fv fragment 
10 that binds to a Don-1 polypeptide. These techniques are 
described in detail in Curiel et al., PCT Publication No, 
WO 96/07321. 

Modulating Don-1 Expression 

Don-1 polypeptides can be administered to 

15 stimulate the proliferation of cells, such as epithelial 
cells, e.g., to promote wound healing. Other therapies, 
e.g., anti-tumor therapies, can be designed to reduce the 
level of endogenous Don-1 gene expression, e.g., using 
antisense or ribozyme approaches to inhibit or prevent 

20 translation of Don-1 mRNA transcripts; triple helix 

approaches to inhibit transcription of the Don-1 gene; or 
targeted homologous recombination to inactivate or "knock 
out" the Don-1 gene or its endogenous promoter. 

Because the Don-1 gene is expressed in the brain, 

25 delivery techniques should be preferably designed to 

cross the blood-brain barrier ( see , e.g., PCT Publication 
No. WO89/10134) . Alternatively, the antisense, ribozyme, 
or DNA constructs described herein could be administered 
directly to the site containing the target cells; e.g., 

30 brain, kidney, lung, uterus, endothelial and epithelial 
cells, fibroblasts, and breast and prostate cells. 
Antisense Nucleic Acids 

Antisense approaches involve the design of 
oligonucleotides (either DNA or RNA) that are 
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complementary to Don-1 mRNA, The antisense 
oligonucleotides bind to the complementary Don-l mRNA 
transcripts and prevent translation. Absolute 
complementarity, although preferred, is not required. A 
5 sequence "complementary" to a portion of an RNA, as 
referred to herein, means a sequence having sufficient 
complementarity to be able to hybridize with the RNA and 
form a stable duplex; in the case of double-stranded 
antisense nucleic acids, a single strand of the duplex 

10 DNA can be tested, or triplex formation can be assayed. 
The ability to hybridize will depend on both the degree 
of complementarity and the length of the antisense 
nucleic acid. Generally, the longer the hybridizing 
nucleic acid, the more base mismatches with an RNA it may 

15 contain and still form a stable duplex (or triplex, as 
the case may be) , One skilled in the art can ascertain a 
tolerable degree of mismatch by use of standard 
procedures to determine the melting point of the 
hybridized complex. 

20 Oligonucleotides that are complementary to the 5' 

end of the message, e.g., the 5' untranslated sequence up 
to and including the AUG initiation codon, should work 
most efficiently at inhibiting translation. However, 
sequences complementary to the 3 ' untranslated sequences 

25 of mRNAs have been shown to be effective at inhibiting 
translation of mRNAs as well (Wagner, Nature, 372:333, 
1984) . Thus, oligonucleotides complementary to either 
the 5'- or 3'- non-translated, non-coding regions of the 
don-1 gene, e,g., the human gene, as represented by the 

30 cDNA (SEQ ID NO: 5) shown in Fig. 3, can be used in an 
antisense approach to inhibit translation of endogenous 
Don-1 mRNA. Oligonucleotides complementary to the 5' 
untranslated region of the mRNA should include the 
complement of the AUG start codon. 
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Antisense oligonucleotides complementary to mRNA 
coding regions are less efficient inhibitors of 
translation but could be used in accordance with the 
invention. Whether designed to hybridize to the 5'-, 3'- 
5 , or coding region of Don-1 mRNA, antisense nucleic acids 
should be at least six nucleotides in length, and are 
preferably oligonucleotides ranging from 6 to about 50 
nucleotides in length. In specific aspects the 
oligonucleotide is at least 10 nucleotides, at least 17 
10 nucleotides, at least 25 nucleotides or at least 50 
nucleotides . 

Regardless of the choice of target sequence, it is 
preferred that in vitro studies are first performed to 
quantitate the ability of the antisense oligonucleotide 

15 to inhibit gene expression. 

It is preferred that these studies utilize 
controls that distinguish between antisense gene 
inhibition and nonspecific biological effects of 
oligonucleotides. It is also preferred that these 

20 studies compare levels of the target RNA or protein with 
that of an internal control RNA or protein - 
Additionally, it is envisioned that results obtained 
using the antisense oligonucleotide are compared with 
those obtained using a control oligonucleotide. It is 

25 preferred that the control oligonucleotide is of 

approximately the same length as the test oligonucleotide 
and that the nucleotide sequence of the oligonucleotide 
differs from the antisense sequence no more than is 
necessary to prevent specific hybridization to the target 

30 sequence. 

The oligonucleotides can be DNA or RNA, or 
chimeric mixtures, or derivatives or modified versions 
thereof, and can be single-stranded or double-stranded. 
The oligonucleotides can be modified at the base moiety, 

35 sugar moiety, or phosphate backbone, for example, to 
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improve stability of the molecule, hybridization, etc. 
The oligonucleotide may include other appended groups 
such as peptides (e.g. , for targeting host cell receptors 
in vivo) , or agents facilitating transport across the 
5 cell membrane (as described, e.g., in Letsinger et al., 
Proc. Natl, Acad. Sci . USA, 86:6553, 1989; Lemaitre et 
al., Proa. Natl. Acad. Sci. USA, 84:648, 1987; PCT 
Publication No. WO 88/09810) or the blood-brain barrier 
( see , e.g., PCT Publication No. WO 89/10134), or 

10 hybridization-triggered cleavage agents ( see , e.g. , Krol 
et al., BioTechniques , 6:958, 1988), or intercalating 
agents ( see , e.g., Zon, Pharm. Res., 5:539, 1988). To 
this end, the oligonucleotide can be conjugated to 
another molecule, e.g., a peptide, hybridization 

15 triggered cross-linking agent, transport agent, or 
hybridization-triggered cleavage agent. 

The antisense oligonucleotide can include at least 
one modified base moiety selected from the group 
including, but not limited to, 5-f luorouracil , 5- 

20 bromouracil, 5-chlorouracil , 5-iodouracil, hypoxanthine, 
xantine, 4-acetylcytosine, 5- (carboxyhydroxylmethyl) 
uracil , 5-carboxymethylaminomethyl-2-thiouridine, 5- 
carboxymethyl-aminomethyluracil , dihydrouracil , beta-D- 
galactosylqueosine , inosine , N6-isopentenyladenine , 1- 

25 methylguanine, 1-methylinosine, 2 , 2-dimethylguanine , 2- 
methy ladenine , 2 -methy Iguanine , 3 -methylcytosine , 5- 
raethylcytosine, N6-adenine, 7-methy Iguanine , 5- 
methylaminomethyluracil , 5-methoxyaminomethyl-2- 
thiouracil , beta-D-mannosylqueosine , 5 ' - 

30 methoxycarboxymethy luracil , 5-methoxyuracil , 2- 

methylthio-N6-isopentenyladenine, uracil-5-oxyacetic acid 
(v) , wybutoxosine, pseudouracil , queosine, 2- 
thiocytosine, 5-methyl-2-theouracil , 2-thiouracil , 4- 
thiouracil, 5-methy luracil , uracil-5-oxyacetic acid 

35 methylester, uracil-5-oxyacetic acid (v) , 5-methyl-2- 
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thiouracil , 2- (3-ainino-3-N-2-carboxypropl) uracil, 
(acp3)w, and 2 , 6-diaininopurine. 

The antisense oligonucleotide can also include at 
least one modified sugar moiety selected from the group 
5 including, but not limited to, arabinose, 2- 
f luoroarabinose , xylulose, and hexose. 

In yet another embodiment, the antisense 
oligonucleotide includes at least one modified phosphate 
backbone, e.g., a phosphorothioate , a phosphorodithioate , 
10 a phosphoramidothioate, a phosphoramidate, a 

phosphordiamidate, a methy Iphosphonate , an alkyl 
phosphotriester , and a formacetal, or an analog of any of 
these backbones. 

In addition, the antisense oligonucleotide can be 
15 an a-anomeric oligonucleotide that forms specific double- 
stranded hybrids with complementary RNA in which, 
contrary to the usual ^-units, the strands run parallel 
to each other (Gautier et al. , Nucl . Acids* Res., 
15:6625, 1987). The oligonucleotide can be a 2'-0- 
20 methy Iribonucleotide (Inoue et al,, Nucl. Acids Res., 
15:6131, 1987), or a chimeric RNA-DNA analog (Inoue et 
al., FEES Lett., 215:327, 1987). 

Antisense oligonucleotides of the invention can be 
synthesized by standard methods known in the art, e.g,, 
25 by use of an automated DNA synthesizer (such as are 
commercially available from Biosearch, Applied 
Biosystems, etc,)* As examples, phosphorothioate 
oligonucleotides can be synthesized by the method of 
Stein et al., Nucl . Acids Res,, 16:3209, 1988, and 
30 methy Iphosphonate oligonucleotides can be prepared by use 
of controlled pore glass polymer supports (Sarin et al., 
Proc, Natl. Acad. Sci . USA, 85:7448, 1988). 

While antisense nucleotides complementary to the 
Don-1 coding region sequence could be used, those 
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complementary to the transcribed untranslated region are 
most preferred. 

One example of a 15 nucleotide antisense sequence 
to the human don-1 gene is directed against the EGF 
5 domain: 5' -GACTTGGCTCTCTCG-3 ' (SEQ ID NO:22). Another 
example of a 15 nucleotide antisense sequence to the 
human don-l gene is: 5 ' -GGACTCCGACMTCT-3 ' (SEQ ID 
NO: 23), where the underlined sequence represents the 
complement of the initiator methionine codon. 

10 The antisense molecules should be delivered to 

cells that express Don-l In vivo, e.g., brain, kidney, 
lung, uterus, endothelial and epithelial cells, 
fibroblasts, and breast and prostate cells. A number of 
methods have been developed for delivering antisense DNA 

15 or RNA to cells; e.g., antisense molecules can be 
injected directly into the tissue site, or modified 
antisense molecules, designed to target the desired cells 
(e.g., antisense linked to peptides or antibodies that 
specifically bind receptors or antigens expressed on the 

20 target cell surface) can be administered systemically . 
However, it is often difficult to achieve 
intracellular concentrations of the antisense molecules 
sufficient to suppress translation of endogenous mRNAs. 
Therefore, a preferred approach uses a recombinant DNA 

25 construct in which the antisense oligonucleotide is 
placed under the control of a strong poi III or pol II 
promoter. The use of such a construct to transfect 
target cells in the patient will result in the 
transcription of sufficient amounts of single stranded 

30 RNAs that will form complementary base pairs with the 
endogenous Don-l transcripts and thereby prevent 
translation of the Don-l mRNA. For example, a vector can 
be introduced in vivo such that it is taken up by a cell 
and directs the transcription of an antisense RNA. Such 

35 a vector can remain episomal or become chromosomally 
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integrated, as long as it can be transcribed to produce 
the desired antisense RNA. 

Such vectors can be constructed by recombinant DNA 
technology methods standard in the art. Vectors can be 
5 plasmid, viral, or others known in the art, used for 
replication and expression in mammalian cells. 
Expression of the sequence encoding the antisense RNA can 
be by any promoter known in the art to act in mammalian, 
preferably human, cells. Such promoters can be inducible 

10 or constitutive. Such promoters include, but are not 
limited to: the SV40 early promoter region (Bernoist et 
al-. Nature, 290:304, 1981); the promoter contained in 
the 3' long terminal repeat of Rous sarcoma virus 
(Yamamoto et al., Cell, 22:787-797, 1988); the herpes 

15 thymidine kinase promoter (Wagner et al., Proc, Natl. 
Acad, Sex. USA, 78:1441, 1981); or the regulatory 
sequences of the metallothionein gene (Brinster et al., 
Nature, 296:39, 1988). 

Any type of plasmid, cosmid, YAC, or viral vector 

20 can be used to prepare the recombinant DNA construct 
which can be introduced directly into the tissue site; 
e.g., the brain, kidney, lung, uterus, endothelial and 
epithelial cells, fibroblasts, and breast and prostate 
cells. Alternatively, viral vectors can be used that 

25 selectively infect the desired tissue (e.g., for brain, 
herpesvirus vectors may be used) , in which case 
administration can be accomplished by another route 
(e.g. , systemically) . 
Ribozymes 

30 Ribozyme molecules designed to catalytically 

cleave Don-1 mRNA transcripts also can be used to prevent 
translation of Don-1 mRNA and expression of Don-1 (see, 
e.g., PCT Publication WO 90/11364; Saraver et al.. 
Science, 247:1222, 1990). While various ribozymes that 

35 cleave mRNA at site-specific recognition sequences can be 
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used to destroy Don-I mRNAs, the use of hammerhead 
ribozymes is preferred. Hammerhead ribozymes cleave 
mRNAs at locations dictated by flanking regions that form 
complementary base pairs with the target mRNA. The sole 
5 requirement is that the target mRNA have the following 
sequence of two bases: 5'-UG-3', The construction and 
production of hammerhead ribozymes is known in the art 
(Haseloff et al., Nature, 334:585, 1988). There are 
numerous examples of potential hammerhead ribozyrae 

10 cleavage sites within the nucleotide sequence of human 
Don-1 cDNAs (Figs. 2 and 4). Preferably, the ribozyme is 
engineered so that the cleavage recognition site is 
located near the 5' end of the Don-1 mRNA, i.e., to 
increase efficiency and minimize the intracellular 

15 accumulation of non-functional mRNA transcripts. 

Examples of potential ribozyme sites in human Don- 
1 include 5'-UG-3' sites which correspond to the 
initiator methionine codon (nucleotides 664-666 in human 
SEQ ID NO: 5 and 69-71 in human SEQ ID NOs:7 and 31) and 

20 the codons for each of the cysteine residues of the EGF 
domain (e.g., nucleotides 493-494, 517-519, 535-537, 568- 
570, 574-576, and 601-603 in human SEQ ID NOs:7 and 31, 
and nucleotides 973-975, 997-999, 1015-1017, 1048-1050, 
1054-1056, and 1081-1083 in human SEQ ID N0:5). 

2 5 The ribozymes of the present invention also 

include RNA endor ibonucleases (hereinafter "Cech-type 
ribozymes"), such as the one that occurs naturally in 
TetrahymGna Thermopbila (known as the IVS or L-19 IVS 
RNA) , and which has been extensively described by Cech 

30 and his collaborators (Zaug et al.. Science, 224:574, 
1984; Zaug et al. , Science, 231:470, 1986; Zug et al . , 
Nature, 324:429, 1986; PCT Application No. WO 88/04300; 
and Been et al.. Cell, 47:207, 1986). The Cech-type 
ribozymes have an eight base-pair sequence that 

35 hybridizes to a target RNA sequence, whereafter cleavage 
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of the target RNA takes place. The invention encompasses 
those Cech-type ribozymes that target eight base-pair 
active site sequences present in Don-1 polypeptides. 

As in the antisense approach, the ribozymes can be 
5 composed of modified oligonucleotides (e.g., for improved 
stability, targeting, etc.)/ and should be delivered to 
cells which express the Don-1 in vivo, e.g., brain, 
kidney, lung, uterus, endothelial and epithelial cells, 
fibroblasts, and breast and prostate cells. A preferred 

10 method of delivery involves using a DNA construct 

"encoding" the ribozyme under the control of a strong 
constitutive pol III or poJ II promoter, so that 
transfected cells will produce sufficient quantities of 
the ribozyme to destroy endogenous Don-1 messages and 

15 inhibit translation. Because ribozymes, unlike antisense 
molecules, are catalytic, a lower intracellular 
concentration is required for efficiency. 

other Methods for Reducing Don-1 Expression 
Endogenous don-1 gene expression can also be 

20 reduced by inactivating or "knocking out" the don-1 gene 
or its promoter using targeted homologous recombination 
( see . e.g., U.S. Patent No. 5,464,764). For example, a 
mutant, non-functional dowl (or a completely unrelated 
DNA sequence) flanked by DNA homologous to the endogenous 

25 don-1 gene (either the coding regions or regulatory 

regions of the don-1 gene) can be used, with or without a 
selectable marker and/or a negative selectable marker, to 
transfect cells that express Don-1 in vivo. Insertion of 
the DNA construct, via targeted homologous recombination, 

30 results in inactivation of the don-1 gene. Such 
approaches are particularly suited for use in the 
agricultural field where modifications to ES (embryonic 
stem) cells can be used to generate animal offspring with 
an inactive don-1 gene. This approach can be adapted for 

35 use in humans, provided the recombinant DNA constructs 
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are directly administered or targeted to the required 
site in vivo using appropriate viral vectors, e.g. , 
herpes virus vectors for delivery to brain tissue. 

Alternatively, endogenous don-I gene expression 
5 can be reduced by targeting deoxyribonucleotide sequences 
complementary to the regulatory region of the don-i gene 
(i.e., don~l promoters and/or enhancers located upstream 
to the start codon in the untranslated region) to form 
triple helical structures that prevent transcription of 
10 the don-I gene in target cells in the body (Helene, 
Anticancer Drug Dbs , , 6:569, 1981; Helene et al. , Ann. 
N.y. Acad. Sci., 660:27, 1992; and Maher, Bioassays , 
14:807, 1992) . 

Identification of Proteins That Interact With Don-1 

15 The invention also features proteins that interact 

with Don-l polypeptides. Any method suitable for 
detecting protein-protein interactions can be employed to 
identify transmembrane, intracellular, or extracellular 
proteins that interact with Don-1 polypeptides. Among 

20 the traditional methods which can be employed are co- 
immunoprecipitation, crosslinking and co-purification 
through gradients or chromatographic columns of cell 
lysates or proteins obtained from cell lysates, and the 
use of Don-1 polypeptides to identify proteins in the 

25 lysate that interact with the Don-1 polypeptide. 

For these assays, the Don-1 polypeptide can be a 
full length Don-1, a soluble extracellular domain of Don- 
1, or some other suitable Don-1 polypeptide, e.g., a 
polypeptide including the EGF domain of Don-1. Once 

30 isolated, such an interacting protein can be identified 
and cloned and then used, in conjunction with standard 
techniques, to identify proteins with which it interacts. 
For example, at least a portion of the amino acid 
sequence of a protein which interacts with a Don-1 
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polypeptide can be ascertained using techniques well 
known to those of skill in the art, such as via the Edman 
degradation technique* The amino acid sequence obtained 
can be used as a guide to generate oligonucleotide 
5 mixtures that can be used to screen for gene sequences 
encoding the interacting protein. Screening can be 
accomplished, for example, by standard hybridization or 
PGR techniques. Techniques for generating 
oligonucleotide mixtures and the screening are known. 

10 See, e.g., Ausubel, supra; and PGR Protocols: A Guide to 
Methods and Applications, 1990, Innis et al-, eds. 
Academic Press, Inc., New York. 

Additionally, methods may be employed which result 
in the direct identification of genes that encode 

15 proteins that interact with Don-1 polypeptides • These 
methods include, for example, screening expression 
libraries, in a manner similar to the well known 
technique of antibody probing of Xgtll libraries, using a 
labeled Don-l polypeptide or a Don-1 fusion protein, 

20 e.g., a Don-1 domain fused to a marker such as an enzyme, 
fluorescent dye, a luminescent protein, or to an IgFc 
domain- 

There are also methods for detecting protein 
interactions, e.g., the in vivo two-hybrid system (Chien 

25 et al., Proc. Natl, Acad. Sci. USA, 88:9578, 1991). A 

kit for practicing this method is available from Clontech 
(Palo Alto, CA) . Briefly, to use this system, plasmids 
are constructed that encode two hybrid proteins. One 
plasmid includes a nucleotide sequence encoding the DNA- 

30 binding domain of a transcription activator protein fused 
to a nucleotide sequence encoding a full-length Don^l 
protein, a Don-1 polypeptide, or a Don-1 fusion protein. 
The other plasmid includes a nucleotide sequence encoding 
the transcription activator protein's activation domain 

35 fused to a cDNA encoding an unknown protein from which a 



wo 98/07736 



PCT/US97/14585 



- 56 - 

cDNA library has been recombined into this plasmid. The 
DNA-binding domain fusion plasmid and the cDNA library 
are transformed into a strain of the yeast Saccharomyces 
cerBvislae that contains a reporter gene (e.g., HBS or 
5 lacZ) whose regulatory region contains the transcription 
activator's binding site. 

Either hybrid protein alone cannot activate 
transcription of the reporter gene. The DNA-binding 
domain hybrid cannot because it does not provide 

10 activation function, and the activation domain hybrid 
cannot because it cannot localize to the activator's 
binding sites. Interaction of the appropriate two hybrid 
proteins reconstitutes the functional activator protein 
and results in expression of the reporter gene, which is 

15 detected by an assay for the reporter gene product. 

The two-hybrid system and related methods can be 
used to screen activation domain libraries for proteins 
that interact with a "bait" gene product. By way of 
example, a Don-1 polypeptide can be used as the bait gene 

20 product. Total genomic or cDNA seguences are fused to 
DNA encoding an activation domain. This library and a 
plasmid encoding a hybrid of bait Don-1 gene product 
fused to the DNA-binding domain are cotransf ormed into a 
yeast reporter strain, and the resulting transf ormants 

25 are screened for those that express the reporter gene. 
For example, a bait don-1 gene seguence encoding a Don-1 
polypeptide, or a domain of Don-1, can be cloned into a 
vector such that it is translationally fused to DNA 
encoding the DNA-binding domain of the GAL.4 protein. 

30 These colonies are purified and the library plasmids 
responsible for reporter gene expression are isolated, 
DNA seguencing is then used to identify the proteins 
encoded by the library plasmids. 

A cDNA library of the cell line from which 

35 proteins that interact with bait don-1 gene product are 
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to be detected can be made using methods routinely 
practiced in the art. According to the particular system 
described herein, for example, cDNA fragments can be 
inserted into a vector such that they are translationally 
5 fused to the transcriptional activation domain of GAL4 . 
This library can be co-transformed along with the bait 
don-1 gene-GAL4 fusion plasmid into a yeast strain which 
contains a lacZ gene driven by a promoter that contains a 
GAL4 activation sequence. A cDNA encoded protein, fused 

10 to GAL4 transcriptional activation domain, that interacts 
with bait don-l gene product will reconstitute an active 
GAL4 protein and thereby drive expression of the HIS3 
gene. Colonies that express HIS3 then can be purified 
from these strains, and used to produce and isolate the 

15 bait don-l gene-interacting protein using techniques 
routinely practiced in the art. 

Therapeutic Applications 

The Don-l proteins and polypeptides described 
herein stimulate proliferation of epithelial cells and 

20 are thus particularly implicated in melanomas and 

adenocarcinomas in which epithelial cells proliferate out 
of control. Accordingly, undesirable tumors, such as 
melanomas and adenocarcinomas of the skin, esophagus, 
lung, breast, liver, pancreas, gastrointestinal tract, 

25 colon, prostate, and uterus can be reduced by the 

administration of a compound that interferes with Don-l 
expression or function (e,g. , an antibody). Compounds 
that interfere with Don-l function can also be used to 
treat other undesirable disease processes, e.g., cyst and 

30 polyp formation. 

In addition, since Don-l polypeptides promote or 
stimulate epithelial cell proliferation, the topical 
administration of Don-l polypeptides to wounds promotes 
wound healing. 
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Because Don-1 is highly expressed in the brain, 
Don-1 also may play a significant role regulating tumor 
formation and progression in the brain. Of course, in 
some circumstances, including certain phases of many of 
5 the above-described conditions, it may be desirable to 
enhance Don-l function, e.g., to stimulate cell 
proliferation or differentiation, or enhance or suppress 
apoptosis. 

Recombinant Don-l should facilitate the production 

10 of pharmacologic modifiers and inhibitors of Don-1 

function. Compounds that interfere with Don-1 function 
include molecules that bind to Don-1, such as antibodies, 
and prevent it from binding with its receptors, e.g., 
pl85, or small molecules or anti-idiotype antibodies, 

15 that mimic certain domains of Don-1, such as the EGF 
domain, and bind, preferably irreversibly, to Don-1 
receptors without activating these receptors, e.g., 
without causing phosphorylation or dimerization of these 
receptors. For example, using standard techniques, a 

20 Don-1 EGF polypeptide can be mutated and tested in the 
pl85 assay described herein. Any of these mutant 
polypeptides that bind to the receptor with high 
affinity, but do not cause phosphorylation and/or 
dimerization, are candidates for anti-tumor therapy. 

25 Therapeutic Don-1 polypeptides, antibodies, or 

small molecules of the invention can be administered by 
any appropriate route, e.g., injection or infusion by 
intravenous , intraperitoneal , intracerebral , 
intramuscular, intraocular, intraarterial, or 

30 intralesional routes, or by sustained release systems as 
note below. Don-1 is administered continuously by 
infusion or by bolus injection. Don-1 antibodies are 
administered in the same fashion, or by administration 
into the blood stream or lymph. Treatment is repeated as 

35 necessary for alleviation of disease symptoms. 
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Suitable examples of sustained-release 
preparations include semipermeable matrices of solid 
hydrophobic polymers containing the protein, which 
matrices are in the form of shaped articles, e.g., films 
5 or microcapsules. Examples of sustained-release matrices 
include polyesters, hydrogels (e.g., poly (2 -hydroxy ethyl - 
methacrylate) as described by Langer et al, , J. Biomed. 
Mater, Res,, 15:167-277 (1981), and Langer, Chem. Tech., 
12:98-105 (1982), or polyvinylalcohol) , or polylactides 
10 (as described in U.S. Pat. No. 3,773,919, and EPA 
58,481) , 

Sustained-release Don-1 polypeptide or antibody 
compositions also include liposomally entrapped Don-1 or 
Don-1 antibodies. Liposomes containing Don-1 or antibody 

15 are prepared by methods known per se. See, e,g., Epstein 
et al., P.N.A.S., USA, 82:3688-3692 (1985); Hwang et al . , 
P.N,A,S., USA, 77:4030-4034 (1980); and U.S. Pat, Nos. 
4,485,045 and 4,544,545. The liposomes are preferably 
about 200-800 Angstroms in diameter and are unilamelar. 

20 The lipid content is generally greater than about 30 mol. 
percent cholesterol, the selected proportion being 
adjusted for the optimal Don-1 therapy. Liposomes with 
enhanced circulation time are disclosed in U,S. Pat. No. 
5,013 , 556. 

25 An effective amount of Don-l or Don-l antibody to 

be employed therapeutically will depend, for example, 
upon the therapeutic objectives, the route of 
administration, and the condition of the patient. 
Accordingly, it will be necessary for the therapist to 

30 titer the dosage and modify the route of administration 
as required to obtain the optimal therapeutic effect. A 
typical daily dosage might range from about l.O /ig/kg to 
about 100 mg/kg or more, depending on the factors 
mentioned above. Typically, the clinician will 

35 administer Don-1 or Don-1 antibody until a dosage is 
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reached that achieves the desired effect- The progress 
of this therapy is easily monitored by conventional 
assays. 

Diagnostic Applications 
5 The polypeptides of the invention and the 

antibodies specific for these polypeptides are also 
useful for identifying those compartments of mammalian 
cells that contain proteins important to the function of 
Don-1. Antibodies specific for Don-1 can be produced as 

10 described above. The normal subcellular location of the 
protein is then determined either in situ or using 
fractionated cells by any standard immunological or 
immunohistochemical procedure (see, e.g., Ausubel et al., 
supra ; Bancroft and Stevens, Theory and Practice of 

15 Histological Techniques , Churchill Livingstone, 1982). 

Antibodies specific for Don-1 also can be used to 
detect or monitor Don-l-related diseases. For example, 
levels of a Don-1 protein in a sample can be assayed by 
any standard technique using these antibodies. For 

20 example, Don-1 protein expression can be monitored by 
standard immunological or immunohistochemical procedures 
(e.g., those described above) using the antibodies 
described herein. Alternatively, Don-1 expression can be 
assayed by standard Northern blot analysis or can be 

25 aided by PGR (see, e.g., Ausubel et al., supra ; PGR 
Technology: Principles and Applications for DNA 
Amplification , ed. , H.A. Ehrlich, Stockton Press, NY), 
If desired or necessary, analysis can be carried out to 
detect point mutations in the Don-1 sequence (for 

30 example, using well known nucleic acid mismatch detection 
techniques) . All of the above techniques are enabled by 
the Don-1 sequences described herein. 
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Examples 

Example 1 describes the identification and 
sequencing of several cDNAs corresponding to different 
splice variants of murine and human don-1 genes. Example 
5 2 describes the characterization of Don-1 using a pl85 
assay, and differential expression pattern experiments. 
Example 3 describes chromosomal mapping of the don~l 
gene. 

Example 1: Cloning of th e don-1 Gene 

10 The gene for murine Don-1 was identified in a 

mouse choroid plexus cDNA library. The first murine 
splice variant of the don-1 gene was used to identify an 
additional murine splice variant in a mouse lung cDNA 
library and two splice variants of the human don-1 gene 

15 in a human fetal lung cDNA library. The identification 
and sequencing of both murine and human genes is 
described in this first example. 
cDNA Library Screening 

To obtain a full length cDNA sequence, a mouse 
20 lung library (Stratagene, La Jolla, Ca) was screened 

using the 1.4 kb Not I/Sal I fragment originally isolated 
from a choroid plexus library as described below. 
Screening protocols were as described by Sambrook et al., 
Molecular Cloning: A Laboratory Manual, 2nd ed. , (Cold 
25 Spring Harbor Press, 1989) . A homologous human sequence 
was obtained from a human fetal brain library (Clontech, 
Palo Alto, Ca) by hybridization with a 1.4 kb Notl/Sall 
fragment of the murine cDNA of SEQ ID N0:1 as described 
above . 

30 Choroid-Plexus mRNA Isolation 

The murine mRNA used to create the murine choroid 
plexus library was prepared as follows. Total RNA was 
isolated from mouse choroid plexus tissue using the 
guanidinium isothiocyanate/CsCl method of Chirgwin et al. 
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(Biochemistry 18:5294, 1979) as described in Current 
Protocols for Molecular Biology (supra) . The RNA was 
quantitated, diluted to 1 mg/ml in water, and then 
incubated for 30 minutes at 37 ^'C with an equal volume of 
5 DKase solution (20 itiM MgClj, 2 xM DTT, 0.1 units DNase, 
0.6 units RNase inhibitor in TE) to remove contaminating 
DNA, The RNA was then extracted with 
phenol/chloroform/ isoartiyl , and ethanol precipitated. 
After quantitation at 260 nm, an aliquot was 

10 electrophoresed to check the integrity of the RNA. Next, 
Poly A^ RNA was isolated using an Oligotex-dT kit from 
Qiagen (Chatsworth, CA) as described by the manufacturer. 
After quantitation, the mRNA was precipitated in ethanol 
and resuspended at a concentration of 1 mg/ml in water. 

15 Choroid plexus mRNA was used as a template for 

preparation of cDNA according to the method of Gubler et 
al. (Gene 25:263, 1983) using a Superscript Plasmid cDNA 
synthesis kit (Life Technologies; Gaithersburg, MD) , The 
cDNA obtained was ligated into the Notl/Sal I sites of 

20 the mammalian expression vector pMET7, a modified version 
of pMElSS, which utilizes the SRa promoter as described 
previously (Takebe, Wol . Ceil. Bio. 8:466, 1988). 
Ligated cDNA was transformed into elect rocompetent DHIOB 
E. coll either prepared by standard procedures or 

25 Obtained from Life Technologies. 

DNA Preparation and Sequence Analysis 
A cDNA clone from the murine choroid plexus 
library was sequenced to identify sequences of interest. 
The identified sequence was then used to clone and 

30 sequence a second murine splice variant of the don-1 
gene. The identification and analysis is performed as 
follows. 

First, 96-well plates were inoculated with 
individual choroid plexus library transf ormants in 1 ml 
35 of LB-amp. These inoculations were based on the titers 
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of the cDNA transf ormants. The resulting cultures were 
grown for 15 to 16 hours at 3 7*»C with aeration. Prior to 
DNA preparation, 100 ml of cell suspension was removed 
and added to 100 ml of 50% glycerol, mixed and stored at 
5 -80^C (glycerol freeze plate) . DNA was then prepared 
using the Wizard miniprep system (Promega; Madison, WI) 
employing modifications for a 96-well format. 

The insert cDNAs of a number of clones were 
sequenced by standard, automated fluorescent 

10 dideoxynucleotide sequencing using dye-primer chemistry 
(Applied Biosystems, Inc.; Foster City, CA) on Applied 
Biosystems 373 and 377 sequenators (Applied Biosystems) . 
The primer used in this sequencing was proximal to the 
SRa promoter of the vector and therefore selective for 

15 the 5' end of the clones, although other primers with 
this selectivity can also be used. The short cDNA 
sequences obtained in this manner were screened as 
follows. 

First; each sequence was checked to determine if 

20 it was a bacterial, ribosomal, or mitochondrial 

contaminant. Such sequences were excluded from the 
subsequent analysis. Second, sequence artifacts, such as 
vector and repetitive elements, were masked and/or 
removed from each sequence. Third, the remaining 

25 sequences were searched against a copy of the GenBank 
nucleotide database using the BLASTN program (BLASTN 
1-3MP: Altschul et al., J". Mol . Bio, 215:403, 1990). 
Fourth, the sequences were analyzed against a non- 
redundant protein database with the BLASTX program 

30 (BLASTX 1.3MP: Altschul et al., supra). This protein 
database is a combination of the Swiss-Prot, PIR, and 
NCBI GenPept protein databases. The BLASTX program was 
run using the default BLOSUM-62 substitution matrix with 
the filter parameter: "xnu+seg" . The score cutoff 

35 utilized was 75. 
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Assembly of overlapping clones into contigs was 
done using the program Sequencher (Gene Codes Corp.; Ann 
Arbor, MI) . The assembled contigs were analyzed using 
the programs in the GCG package (Genetic Computer Group, 
5 University Research Park, 575 Science Drive, Madison, WI 
53711) . 

The above-described analysis resulted in the 
identification of a secreted, murine clone having an open 
reading frame of 13 9 amino acids. The protein encoded by 

10 this clone was named "murine Don-1.'* The amino-terminal 
portion of murine Don-l has significant homology to the 
known heregulin gene. This portion is 41% identical to 
human heregulin based on a primary sequence alignment of 
the Ig and EGF domains of murine Don-1 with human 

15 heregulin. 

This first splice variant of murine Don-1 was used 
as a probe to obtain an additional murine splice variant. 

Splice variants of the human don-1 gene were 
isolated in the same way from human fetal brain and fetal 
20 lung cDNA libraries (Clontech, Palo Alto, CA) . 

Example 2r Characterization of Don-1 

The function of Don-1 polypeptide in a pl85 assay 
and the expression pattern of Don-1 were examined as 
described below. Also described below is the expression 
25 of a recombinant form of soluble murine Don-1. 

Pl85 Assav 

MDA-MB4 53 cells (ATCC, Rock vi lie, MD) were grown 
to 80% confluence in DMEM supplemented with 10% FCS in a 
humidified atmosphere of 5% CO2 at 37*>C. The cells were 
30 then replated in serum-free media for 24 hours before 
being exposed to NDF (100 ng/mL) , EGF (100 ng/mL) , or 
transfected 293Ebna-conditioned media (10%) for 15 
minutes at 37^C, Cell lysates were prepared by 
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solubilizing cells in buffer (1% Triton X-100, 0.5% 
deoxycholate, 150 mM NaCl, 2 0 mM Tris pH 8.0, 1 mM EDTA, 
30 mM Na4P207, 50 mM NaF, 0 . 1 mM Na3V04, 10 ug/mL 
aprotinin, and 1 mM PMSF) ; and 100 /xg of protein was 
5 separated on a 10% SDS PAGE gel. Following transfer to 
nitrocellulose, immunodetection of phosphorylated 
proteins was performed using the monoclonal 
antiphosphotyrosine antibody 4G10 (Upstate Biotechnology, 
NY) as described by the manufacturer and utilizing 

10 Enhanced Chemiluminescence (ECL) (Amersham) . NDF and EGF 
were purchased from R&D Systems (Minneapolis, MN) . 

Analysis of phosphorylated proteins by Western 
blotting revealed a robust induction of the 18 5 kDa 
protein in cells induced with NDF and in cells treated 

15 with Don-1 EGF-transf ected 293Ebna cells. The level of 
induction seen with Don-1 EGF was comparable to 
saturating amounts of NDF and represented an approximate 
ten-fold increase in phosphorylation over uninduced 
cells. No induction of phosphorylation was observed in 

20 cells treated with EGF or the conditioned media of mock- 
transf acted 293Ebna cells. This result demonstrates that 
Don-1 binds and activates a known member of the EGFR 
family, pl85. 

Analysis of Don-l Expression 

2 5 Northern Analysis 

Northern analysis was used to examine Don-1 
expression as follows. Mouse and human multiple tissue 
northern blots purchased from Clontech (Palo Alto, Ca) 
were hybridized, according to manufacturer's directions, 

30 to a 1.4 kb Not/Sal fragment of murine Don-1 polypeptide 
SEQ ID N0:1, or to the 200 base-pair region encoding the 
EGF domain which extends from about amino acid location 
104 to about amino acid location 140 of SEQ ID N0:1. 

This Northern analysis revealed that Don-1 appears 

35 to be highly expressed in the mouse brain, although 
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multiple transcripts were also observed in the spleen and 
lung. The message is also differentially expressed 
throughout embryogenesis , indicating a possible role in 
development. In all positive tissues, multiple 
5 transcripts exist, the major sizes being about 4 kb and 
about 3 kb. 

Human tissue Northern blots showed that human Don- 
1 is highly expressed in fetal brain and fetal lung 
tissues. In addition, two transcripts of about 4 kb and 

10 3 kb were detected exclusively in the cerebellum of human 
adult tissue. No other normal adult human tissues 
appeared to express human Don-l. However, Don-1 
transcripts were detected in a human colon adenocarcinoma 
cell line SW480 and in a human melanoma cell line G361, 

15 In these tissues there were two major Don-1 transcripts 
of about 4,4 kb and about 3 kb each. 
In Situ Analvsis 

In situ hybridizations were also used to examine 
Don-1 expression. Tissues for these hybridizations were 

20 prepared as follows. Four to six week old C57BL/6 mice 
were cervically dislocated, and their brains were removed 
and frozen on dry ice. Ten ^m coronal frozen sections of 
brain were post-fixed with 4% formaldehyde in Ix 
phosphate buffered saline (PBS) (25*C) for 10 minutes, 

25 rinsed two times in Ix PBS, rinsed once in 1 M 

triethanolamine-HCl (pH 8), and then incubated in 0,25% 
acetic anhydride/ 1 M triethanolamine-HCl for 10 minutes. 
Sections were then rinsed in 2x SSC. Tissue was 
dehydrated through a series of ethanol washes, 70% 

30 ethanol for 1 minute, 80% for 1 minute, 95% for 2 

minutes, and 100% ethanol for 1 minute. Sections were 
then incubated in 100% chloroform for 5 minutes and 
rinsed in 95% ethanol for 1 minute and 100% ethanol for l 
minute. Sections were air dried for 2 0 minutes. 
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Hybridizations were performed with ^^S- 
radiolabeled (5 x 10^ cpm/ml) cRNA probes encoding a 472 
bp segment of the 5' end of the murine Don-1 gene (SEQ ID 
N0:1, nucleotides 68-540). Probes were incubated in the 
5 presence of 600 mM NaCl^ 10 mM Tris, pH 7.5, 1 loM EDTA, 
0*01% sheared herring sperm, 0.01% yeast tRNA, 0.05% 
total yeast sRNA Type XI, Ix Denhardt's solution, 50% 
formamide, 10% dextran sulfate, 100 mM DTT, 0.1% SDS, and 
0.1% Na thiosulfate for 18 hours at 55**C. 

10 After hybridization, slides were washed with 2x 

SSC. Sections were then incubated with 10 mM Tris-HCl 
(pH 7.6) /500 mM NaCl/1 mM EDTA (TNE) at 37*>C for 10 
minutes, incubated in 10 ^g/ral RNase A in TNE at 37° for 
3 0 minutes, and washed in TNE at 3 7*'C for 30 minutes. 

15 Sections were then rinsed with 2x SSC at room 

temperature, then incubated with 2x SSC at 50«*C for l 
hour, rinsed and incubated with 0.2x SSC at 55 "C for 1 
hour, and then incubated with 0.2x SSC at 60**C for 1 
hour. Sections were then dehydrated through a series of 

20 ethanols, 50%, 70%, 80%, and 90% with 0.3 M NH^OAc, and 
100% ethanol. Sections were air dried and placed on 
Kodak Biomax MR scientific imaging film for 7 days at 
room temperature. 

mRNA transcripts were localized to the cerebellum 

25 and Ammon's horn. Controls for the in situ hybridization 
experiments included the use of a sense probe which 
showed no signal above background levels and RNase 
treated tissue which showed a significantly reduced 
signal . 

30 Expression Cloning 

The EGF domain and flanking amino acids (amino 
acids 85-154 of SEQ ID N0:1) were amplified by PCR and 
then subcloned into a variety of commercially available 
bacterial expression vectors including pGEX (Pharmacia, 

35 Uppsala, Sweden) , pMAL (NEB, Beverly, MA) and pTRX 
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(Invitrogen, San Diego, CA) . Purification of recombinant 
material was performed as described by the manufacturer* 
This same domain was also subcloned into a mammalian 
expression vector, PN8E and then transfected into 293Ebna 
5 cells as detailed by Gibco-BRL (Gaithersburg, MD) • A 

leader sequence (MALPVTALLLPLALLLHAARP; SEQ ID NO: 24) was 
fused to the N-terminal of the EOF domain by PGR and a 
Flag epitope tag was placed on the C-terminal, prior to 
subcloning into PN8E (Ho et al., P.W.A.S, USA, 90:11267- 

10 11271, 1993) - 

293Ebna cells at 80 percent confluence in 6-well 
dishes were transfected with 1.0 DNA in 10 ^1 
lipof ectamine (Gibco-BRL, Gaithersburg, MD) for 5 hours 
at 37 *C in 5 percent in an 800 ^1 final volume, 

15 Following incubation, DMEM and 10 percent Fetal Calf 
Serum were added, and the media was replaced 24 hours 
after the start of transf ection. Culture supernatant was 
collected 48 hours later. 

Prepe^ratiop Qf gplubl^ P9n"l 

20 Soluble forms of recombinant murine or human Don- 

1, or domains thereof, can be produced in bacteria using 
the pGEX expression system as described above for the EGF 
domain of SEQ ID NO:l. The pGEX-Don-1 is purified on 
glutathione agarose and the Don-1 moiety released by 

25 thrombin digestion. Following endotoxin removal on an 
Endotoxin BX column (Cape Cod Associates: Falmouth, MA) 
the Don-1 preparation is determined to contain low levels 
of endotoxin (<0.01 EU/ml) by the Limulus amebocyte 
lysate (LAL) assay (Cape Cod Associates) . 

30 Recombinant, soluble Don-1 is produced as follows. 

First, the murine Don-1 cDNA is amplified with a primer 
corresponding to a sequence at the 5' end of the sequence 
encoding, for example, the EGF domain (5' primer) . The 
5' primer, 5 ' -AAAAAAGAATTCCTCCATGTCAACAGCGTG-3 ' (SEQ ID 

35 NO: 25), has an EcoRI restriction enzyme cleavage site 
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followed by 18 nucleotides encoding the 5' flanking 
region of the EGF domain of murine Don-1. The 3' primer 
used was 5 ' -TCCTCTCTCGAGTCACTTAGGATCTGGCATGTA-3 ' (SEQ ID 
NO: 26). This primer has complementary sequences encoding 
5 amino acids 187 to 192 preceded by a termination codon 
and Xhol site. 

These primer pairs were used for PGR amplification 
using the following conditions: 94*^0 for 30 seconds; 
55"C for 30 seconds and 72 *C for 90 seconds with 30 

10 cycles. The resulting PGR product was cloned into the 
GST fusion protein vector pGEX (Pharmacia, Uppsala, 
Sweden) . The fusion protein was produced in E, coli and 
purified according to the protocol supplied by the 
manufacturer. The Don-1 construct produced a protein of 

15 approximately 7.0 kD after the cleavage of GST by 
thrombin. 



Example 3 : Mapping of the don-1 Gene 

These examples describe chromosome mapping of the 
mouse and human don~l . 
20 Mouse Chromosome Mapping 

The don-i gene was mapped to the proximal end of 
chromosome 18 in the mouse, utilizing a Mus 
spretus/C57BL/6J backcross panel. Don-1 appears to be 
located close to cdc2 5, 17cM from the top of chromosome 
25 18, between the markers D18Mit2 0 and D18Mit24. 

PGR primers were used to amplify mouse genomic DNA 
using standard techniques. Primers were designed from 
noncoding sequences of murine don-1 and were as follows: 

Forward primer: 5 ' -AGAGGAAGGCCAAAGTAGTG-3 ' (SEQ 
30 ID NO: 33} , and 

Reverse primer: 5' "GTGGACCACAAGGTAAACAG-3 ' (SEQ 
ID NO: 34) . 

Other potential primers include: 
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Forward primer: 5'-CACAGTCCACCCCTCAG-3 ' (SEQ ID 
NO: 27) , and 

Reverse primer: 5' -GCTCTGGTAAGCAAACATGG-3 ' (SEQ 
ID NO:28) . 

5 Amplification conditions were 30 cycles at 95**C 

for 1 minute, 60*»c for 1 minute, and 72*»C for 45 seconds. 
Samples were run on nondenaturing 10% acrylamide SSCP gel 
at 20 W and 4*C for 2.5 hours. 

Human Chromosome Mapping 
10 Human don-l can be mapped to a particular 

chromosome by using a panel of radiation hybrids in a 
manner similar to that described for the mouse chromosome 
mapping. 

The following primers are used to amplify human 
15 genomic DNA from a panel of radiation hybrids (Genebridge 
4, Research Genetics, Huntsville, AL) : 

Forward pr imer : 5 ' -TGTGAACTCCTCTGGCCTGT- 3 ' (SEQ 
ID N0:29) , and 

Reverse primer: 5'-GAAGGGGCTGGGCATTTAAT-3 ' (SEQ 
20 ID NO: 30) , 

The amplification profile is as follows: 94**C for 
30 seconds; BS^C for 30 seconds, and 72**C for 45 seconds 
with 30 cycles. Samples are resolved on 1% agarose TAE 
gel , 

2 5 Deposit of Microorganisms 

The following microorganisms were deposited with 
the American Type Culture Collection (ATCC) , Rockville, 
Maryland, on July 3, 1996 and assigned the indicated 
accession number: 

30 Microorganism ATCC Accession 

No. 

E_i. coli CpmDon-la (membrane-bound murine Don-1) 98 096 

E. coli CpmDon-lb (membrane-bound human Don-1) 98097 
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E. coli CpmDon-2 (secreted murine Don-1) 98098 

Deposit Statement 
The subject cultures have been deposited under 
conditions that assure that access to the cultures will 
5 be available during the pendency of the patent 

application to one determined by the Commissioner of 
Patents and Trademarks to be entitled thereto under 37 
CFR 1.14 and 35 USC 122. The deposits are available as 
required by foreign patent laws in countries wherein 

10 counterparts of the subject application, or its progeny, 
are filed. However, it should be understood that the 
availability of a deposit does not constitute a license 
to practice the subject invention in derogation of patent 
rights granted by governmental action. 

15 Further, the subject culture deposits will be 

stored and made available to the public in accord with 
the provisions of the Budapest Treaty for the Deposit of 
Microorganisms, i.e., they will be stored with all the 
care necessary to keep them viable and uncontaminated for 

20 a period of at least five years after the most recent 
request for the furnishing of a sample of the deposits, 
and in any case, for a period of at least 3 0 (thirty) 
years after the date of deposit or for the enforceable 
life of any patent which may issue disclosing the 

25 cultures plus five years after the last request for a 

sample from the deposit. The depositor acknowledges the 
duty to replace the deposits should the depository be 
unable to furnish a sample when requested, due to the 
condition of the deposits. All restrictions on the 

30 availability to the public of the subject culture 

deposits will be irrevocably removed upon the granting of 
a patent disclosing them. 
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Other Embodiments 
The invention also features fragments, variants, 
analogs and derivatives of the Don-1 polypeptides 
described above that retain one or more of the biological 
5 activities of Don-1 such as activation of receptor-type 
tyrosine kinases as described herein. 

The invention includes naturally-occurring and 
non-naturally-occurring allelic variants. Compared to 
the most coitunon naturally-occurring nucleotide sequence 

10 encoding Don-1, the nucleic acid sequence encoding 

allelic variants may have a substitution, deletion, or 
addition of one or more nucleotides. The preferred 
allelic variants are functionally equivalent to 
naturally-occurring Don-1 . 

15 It is to be understood that while the invention 

has been described in conjunction with the detailed 
description thereof, that the foregoing description is 
intended to illustrate and not limit the scope of the 
invention, which is defined by the scope of the appended 

20 claims. Other aspects, advantages, and modifications are 
within the scope of the following claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 

(i) APPLICANT* Millennium Blotherapeutice, Inc. 

(il) TITLE OF THE INVENTION: DON-1 GENE AND POLYPEPTIDES 
AND USES THEREFOR 

(ill) NUMBER OP SEQUENCES: 33 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE! Flsh & Richardson, P.c. 

(B) STREET: 225 Franklin Street 

(C) CITY: Boa ton 

(D) STATE: MA 

(E) COUNTRY: US 

(F) ZIP: 02110-2804 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 
(C> OPERATING SYSTEM: WindowB95 

(D) SOFTWARE: FastSEQ for Windows Version 2.0 

^vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: US/PCT97/ 

(B) FILING DATE: 18-AUG-1997 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/752,307 

(B) FILING DATE: 19-NOV-1996 

(A) APPLICATION NUMBER: 08/699,591 

(B) FILING DATE: 19-AUG-1996 

<viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Meiklejohn, Ph.D., Anita L* 

(B) REGISTRATION NUMBER: 35,283 

(C) REFERENCE /DOCKET NUMBER: 09404 /022WO1 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 617-542-5070 

(B) TELEFAX; 617-542-8906 

(C) TELEX: 200154 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2467 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS t single 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: CDNA 

(ix) FEATURE: 

(A) NAME/KEY: Coding Sequence 

(B) LOCATION: 79... 1893 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOiI: 

CCTAACGOCA AAAACATCAA GAAAGAGGTC GGCAAGATCC TGTGCACTOA CTCCOCCACC 60 
CGGCCCAAGC TGAAGAAG ATG AAG AGC CAG ACA GGA GAG GTG GGT GAG AAG 111 

Met Lys ser Gin Thr Gly Glu Val Gly Glu Lys 

15 10 
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159 



207 



255 



303 



CAG TOG CTC AAO TGT GAO CCA CCG CCC CGA AAC CCC CAC CCC TCC TAT 
Gin S*r L«u Ly. Cya Clu Ala Ala Ala Gly Aan Pro Gin Pro Ser Tyr 

^5 20 25 

CGC TGG TTC AAG GAT GCC AAG GAA CTC AAC CGG ACT OGT GAT ATT CGC 
Arg Trp Phe Lye Aep Gly Lya Glu Leu Asn Arg Ser Arg Aep il© Arg 
30 35 40 

ATC AAG TAT CGC AAT GTC AGA AAG AAC TCA CGG CTA CAG TTC AAC AAA 
lie Ly. Tyr Gly Asn Val Arg Ly« Asn Ser Arg Leu Gin Jha ^ 
*5 50 55 

GTC AOG OTO CAG GAT GCC GGG GAG TAC CTC TGT GAG GCC GAG AAC ATC 
val Arg Val Clu A.p Ala Gly ciu Tyr val Cyo clu Ala g?u t^n 11^ 

65 70 75 

CTT CGG AAC GAC ACC CTG AGG GCC CGA CTC CAT CTC AAC AGC GTC ACC 351 
L«u Gly Ly. A.p Thr Val Arg Gly Arg Leu Hi. Val A.n Ser Val s.r 

80 85 90 

Th^ 7^ I^^ AAG TGC AAT GAG ACC 

Thr Thr Leu Ser Ser Trp Ser Gly His Ala Arg Lye Cys Aen Glu Thr 
95 100 105 

GCC AAC TCC TAC TGT GTG AAT CGA CGC GTG TGC TAC TAC ATC GAC CGC 
Ala Ly. Ser Tyr Cy« Val Aen Gly Gly Val Cys Tyr Tyr II. Glu Glv 
liO 115 120 

ATC AAC CAG CTC TCC TGC AAA TGT CCA AAC CGA TTC TTC CGA CAG AGA 
lie Aen Gin Leu Ser Cys Ly. Cye Pro Aan Gly Phe Phe Gly Gin Arg 

130 135 

TGT TTC GAG AAA CTG CCT TTG CGA TTG TAC ATG CCA GAT CCT AAG CAA 
Cy. Leu Glu Lye Leu Pro Leu Arg Leu Tyr Met Pro A.p Pro Ly. Gin 
145 ISO 155 

AAC GCT GAO CAG CTG TAC CAG AAO AGA GTG CTG ACA ATT ACT GGT ATC 591 
Ly. Ala Clu Glu Leu Tyr Gin Lye Arg Val Leu Thr lie Thr Gly He 

165 170 

TGT GTC CCC CTG CTG GTC GTG GCC ATC GTC TGT GTG GTC GCC TAC TGC 639 
cy. Val Ala Leu Leu Val Val Gly lie Val Cy. Val Val Ala Tyr Cy. 
175 180 185 

AAO ACC AAA AAA CAG AGO AGG CAG ATG CAT CAT CAT CTC CGG CAG AAC 687 
Ly. Thr Ly. Ly. Gin Arg Arg Gin Met Hi. Hi. Hi. Leu Arg Gin A.n 
190 195 200 

ATC TGC CCA GCC CAC CAG AAC CGA AGC CTG GCC AAC CGC CCC ACC CAC 735 
Met Cy. Pro Ala Hi. Gin A.n Arg Ser Lou Ala A.n Gly Pro Ser Hi. 
205 210 215 

CCT CGG CTG GAC CCT GAC GAG ATC CAG ATC GCA GAT TAC ATC TCC AAA 783 
Pro Arg Leu A.p Pro Clu Glu He Gin Met Ala Aep Tyr He Ser Ly. 
"° 225 230 235 

i^l SI? ^'^^ ^ GAA GCT GAG ACC ACG 831 

A.n Val Pro Ala Thr A.p Hie Val He Arg Arg Glu Ala Glu Thr Thr 
240 245 250 

TTC TCT GGG ACC CAC TCC TGT TCA CCT TCT CAC CAC TGC TCC ACA GCC 879 
Phe ser Gly Ser Hi. Ser Cy. Ser Pro Ser Hi. Hia Cy. Ser Thr Ala 
255 260 265 

in^ IfS c?? f?* ^CG TGG AGC CTG GAA 927 

280 



399 



447 



495 



543 



™u « "'^'^ ''^^ GAG AGC CAC ACG TGG AGC CTG GAA 

Thr Pro Thr Ser Ser Hia Arg Hi. Glu Ser Hia Thr Trp Ser Leu Glu 
270 275 
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CCT TCA GAG ACC CTG ACC TCX3 GAT TCC CAG TCA GGC ATC ATG CTA TCA 97 S 

Arg Smr Glu Ser Leu Thr Ser Asp Ser Gin Sor Gly lie Mat Lau S«r 
285 290 295 

TCA GTA GGC ACC ACC AAG TGC AAC AGC CCA GCA TGT GTG GAG GCA CGG 1023 
Ser Val Gly Thr Ser Lya Cya Aan Ser Pro Ala Cye Val Glu Ala Arg 
300 305 310 315 

GCG CGG AGG GCA GCA GCC TAG AGC CAG GAG GAG CGG CGC AGG GCT GCC 1071 
Ala Arg Arg Ala Ala Ala Tyr Ser Gin Glu Glu Arg Arg Arg Ala Ala 
320 325 330 

ATC CCA CCC TAC CAT GAC TCC ATA GAC TCG CTG CGT GAC TCT CCA CAC 1119 
Met Pro Pro Tyr Hie Asp Ser lie Asp Ser Leu Arg Asp Ser Pro Hie 
335 340 345 

ACT GAA AGG TAC GTG TCA GCC TTG ACC ACG CCC GCT CGC CTC TCO CCC 1167 
Ser Glu Arg Tyr Val Ser Ala Leu Thr Thr Pro Ala Arg Lou Ser Pro 
350 355 360 

GTG GAC TTC CAC TAC TCG CTG GCC ACG CAG GTG CCG ACT TTC GAG ATC 1215 
Val Asp Phe Hie Tyr Ser Lau Ala Thr Gin val Pro Thr Phe Glu He 
365 370 375 

ACG TCG CCC AAC TCT GAG CAT GCC GTG TCG CTG CCG CCC GCC COG CCC 1263 
Thr Ser Pro Aan Ser Glu Hie Ala Val Ser Leu Pro Pro Ala Ala Pro 
380 385 390 395 

ATC ACC TAC CGC CTG GCG GAG CAG CAG CCG CTC CTG CGG CAT CCA GCG 1311 
II© Ser Tyr Arg Leu Ala Glu Gin Gin Pro Leu Leu Arg His Pro Ala 
400 405 410 

CCG CCC GGC CCG CGG CCG GGC TCG CGG CCC GCA GCG GAC ATG CAG CGC 1359 
Pro Pro Gly Pro Gly Pro Gly Ser Gly Pro Gly Ala Aop Mot Gin Arg 
415 420 425 

AGC TAC GAC AGC TAC TAC TAC CCT GCG GCG GGG CCC GGG CCG CGG CGC 1407 
Ser Tyr Asp Ser Tyr Tyr Tyr Pro Ala Ala Gly Pro Gly Pro Arg Arg 
430 435 440 

AGC GCC TGC GCG CTG GGA GGC AGC TTG GCC AGC CTG CCC GCC AGC CCC 1455 
Ser Ala Cye Ala Leu Gly Gly Ser Leu Gly Ser Leu Pro Ala Ser Pro 
445 450 455 

TTC CCC ATC CCG GAG GAC GAC GAG TAC GAG ACC ACC CAC GAG TGC GCG 1503 
Phe Arg He Pro Glu Asp Asp Glu Tyr Glu Thr Thr Gin Glu Cye Ala 
460 465 470 475 

CCC CCC CCG CCG CCG CGG CCG CGC ACG CGC CGC GCG TCC CGC AGG ACG 1551 
Pro Pro Pro Pro Pro Arg Pro Arg Thr Arg Gly Ala Ser Arg Arg Thr 
480 485 490 

TCG GCG GGG CCG CCG CGC TCG CGG CGC TCC CGC CTC AAC GGG TTG GCG 1599 
Ser Ala Gly Pro Arg Arg Trp Arg Arg Ser Arg I^eu Aan Gly Leu Ala 
495 500 505 

GCG CAG CGC GCA CCC GCG GCG CGG CAC TCG CTG TCA TTG AGC AGC GCT 1647 
Ala Gin Arg Ala Arg Ala Ala Arg Asp Ser Leu Ser Leu Ser Ser Gly 
510 515 520 

TCG GGC TGC CGC TCG GCG TCG GCC TCC GAC GAC GAC GCG GAC GAC GCG 1695 
Ser Gly Cya Gly Ser Ala Ser Ala Ser Asp Asp Asp Ala Asp Asp Ala 
525 530 535 

GAC GGG GOG CTG GCG GCC GAG AGC ACC CCA TTC CTC CGC CTG CGA GCG 1743 
Asp Gly Ala Leu Ala Ala Glu Ser Thr Pro Phe Leu Gly Leu Arg Ala 
540 545 550 555 
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GCG 
Ala 



CAC GAC GCG TTG CGC TCG 
His Asp Ala Leu Arg Ser 
560 



GAC TCG CCG CCG CTG TGC CCC GCO GCC 
Aap Ser Pro Pro Leu Cys Pro Ala Ala 
565 570 



1791 



GAC 
Aap 



AGC ACC ACT TAC TAC TCC 
Ser Arg Thr Tyr Tyr Ser 
575 



CTG GAC AGC CAC AGC ACG CGC GCC AGC 
Leu Aap Ser His ser Thr Arg Ala Ser 
580 585 



1839 



AGC 
Ser 



AOA CAC AGC CGO GGO CCG 
Arg His Ser Arg Oly Pro 
590 



CCC ACG AGG GCC AAG CAG GAC TCG GGG 
Pro Thr Arg Ala Lye Gin Aep Ser Gly 
595 600 



1887 



CCC CTC TAACCCCCCC CGCCTCGCCC GCCCCACGTC TCCAAGGAGA GCGGAOACCA CC 1945 
Pro I^u 



GACTGGAGAG GGAAAAGGAG CGAACAAAGA AATAAAAATA TTTTTATTTT CTATAAAAGG 2005 

AAAAAAGTAT AACAAAATCT TTTATTTTCA TTTTAGCAAA AAAAATTGTC TTATAATACT 2065 

AGCTAACCGC AAACACCTTT TTATAGGCAA ACTATTTATA TGTAACATCC TGATTTACAG 2125 

CTTCGGAAAA AAAAAAAGAA ACAACAAAAA AAAAAAAAAA AAAAACTCGA GGGGGGGCCC 2185 

GGTACCCAAT TCGCCCTATA GTGAGTCGTA TTACAATTCA CTCCCCOTCG TTTTACAACG 2245 

TOCTGACTGO GAAAACCCTO GCGTTACCCA ACTTAATCGC CTTGCAGCAC ATCCCCCTTT 2305 

CGCCAGCTGG CGTAATAGCG AAAAGCCCCG CACCGATCGC CCTTCCCAAC AGTTGCGCAG 2365 

CCTGAATGGC GAATGOCAAA TTGTAACCGT TAATATTTTG TTAAAATTCC CGTTAAATTT 2425 

TTGTTAAATC ACTCATTTTT TAACCAATAG GCCGAAATCG GC 2467 

(2) INFORMATION FOR SEQ ID NO: 2; 

(i) SEQUENCE CHARACTERISTICS t 

(A) LENGTH: 605 amino acids 

(B) TYPBt amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(V) FRAGMENT TYPE: internal 



(xi) SEQUENCE DESCRIPTION i SEQ ID NOi2: 



Met 


Lys 


Ser 


Gin 


Thr 


Gly 


Glu 


Val 


Gly 


Glu 


Lys 


Gin 


Ser 


Leu 


Lys 


Cys 


1 








5 










10 










15 




Glu 


Ala 


Ala 


Ala 

20 


Gly 


Asn 


Pro 


Gin 


Pro 
25 


Ser 


Tyr 


Arg 


Trp 


Phe 

30 


Lya 


Asp 


Gly 


Lys 


Glu 


Leu 


Asn 


Arg 


Ser 


Arg 


Asp 


He 


Arg 


He 


Lya 


Tyr Gly 


Asn 






35 










40 










45 








Val 


Arg 

50 


Lys 


Asn 


Ser 


Arg 


Leu 
55 


Gin 


Phe 


Asn 


Lys 


Val 

60 


Arg 


Val 


Glu 


Asp 


Ala 


Gly 


Glu 


Tyr 


Val 


Cys 


Glu 


Ala 


Glu 


Asn 


He 


Leu 


Gly 


Lya 


Asp 


Thr 


65 










70 










75 










80 


Val 


Arg 


Gly 


Arg 


Leu 


His 


Val 


Asn 


Ser 


Val 


ser 


Thr 


Thr 


Leu 


Ser 


Ser 








as 










90 










95 




Trp 


Ser 


Gly 


His 
100 


Ala 


Arg 


Lys 


Cya 


Aen 

105 


Glu 


Thr 


Ala 


Lys 


Ser 
110 


Tyr 


Cya 


Val 


Asn 


Cly 
115 


Cly 


Val 


Cys 


Tyr 


Tyr 
120 


Ha 


Glu 


Gly 


He 


Asn 

125 


Gin 


Leu 


Ser 


Cya 


Lys 

130 


Cya 


Pro 


Asn 


Gly 


Phe 

135 


Phe 


Gly 


Gin 


Arg 


Cys 

140 


Leu 


Glu 


Lys 


Leu 


Pro 


Leu 


Arg 


Leu 


Tyr 


Met 


Pro 


Aap 


Pro 


Lya 


Gin 


Lys 


Ala 


Glu 


Glu 


Leu 


145 








150 






155 








160 


Tyr 


Gin 


Lys 


Arg 


Val 
165 


Leu 


Thr 


He 


Thr 


Gly 
170 


He 


Cys 


Val 


Ala 


Leu 

175 


Leu 


Val 


Val 


Gly 


He 

180 


Val 


Cys 


Val 


Val 


Ala 

185 


Tyr 


Cys 


Lys 


Thr 


Lys 

190 


Lys 


Gin 


Arg 


Arg 


Gin 


Met 


His 


Hia 


Hia 


Leu 


Arg 


Gin 


Aan 


Met 


Cya 


Pro 


Ala 


Hia 






195 










200 








205 








Gin 


Asn 


Arg 


ser 


Leu 


Ala 


Asn 


Gly 


Pro 


ser 


His 


Pro Arg 


Leu 


Asp 


Pro 




210 








215 










220 










Glu 


Glu 


lie 


Gin 


Met 


Ala 


Aep 


Tyr 


He 


Ser 


Lys 


Aan 


Val 


Pro 


Ala 


Thr 


225 










230 






235 










240 



605 
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Asp 


Hifl 


Val 


He Arg 


Arg 


Glu Ala Clu Thr 


Thr 


Phe 


Ser 


Gly 


Ser 


Hia 








245 




250 










255 




5«r 


Cya 


Ser 


Pro 


Ser 


His 


His Cye Ser Thr 


Ala 


Thr 


Pro 


Thr 


Ser 


ser 






260 






265 








270 






Hi0 


Arg 


His 


Clu 


Ser 


His 


Thr Trp Ser Leu 


Glu 


Arg 


Ser 


Glu 


Ser 


Leu 




275 








280 






285 








Thr 


Ser 


Aep 


Ser 


Gin 


Ser 


Gly He Met Leu 


Ser 


Ser 


Val 


Gly 


Thr 


Ser 




290 








295 




300 










LyB 


cya 


Aan 


Ser 


Pro 


Ala 


Cye Val Glu Ala 


Arg 


Ala 


Arg 


Arg 


Ala 


Ala 


305 








310 




315 










320 


Ala 


Tyr 


Ser 


Gin 


Clu 


Glu 


Arg Arg Arg Ala 


Ala 


Met 


Pro 


Pro 


7*yr 


Hia 








325 




330 










335 




Asp 


Ser 


lie 


Aep Ser 


Leu 


Arg Asp Ser Pro 


Hia 


Ser 


Clu 


Arg 


Tyr 


Val 






340 






345 








350 






Ser 


Ala 


Leu 


Thr 


Thr 


Pro Ala Arg Leu Ser 


Pro 


Val 


Aap 


Phe 


Hia 


Tyr 






35S 








360 






365 








Ser 


Leu 


Ala 


Thr 


Gin 


Val 


Pro Thr Phe Glu 


He 


Thr 


Ser 


pro 


Aan 


Ser 




370 










375 




380 










Glu 


HiB 


Ala 


Val 


Ser 


Leu 


Pro Pro Ala Ala 


Pro 


He 


Ser 


Tyr 


Arg 


Leu 


385 










390 




395 










400 


Ala 


Glu 


Gin 


Gin 


Pro 


Leu 


Leu Arg His Pro 


Ala 


Pro 


Pro 


Gly 


Pro 


oiy 










405 




410 










415 




Pro 


Gly 


Ser 


Gly 


Pro 


Gly 


Ala Asp Met Gin 


Arg 


Ser 


Tyr 


Aap 


Ser 


Tyr 






420 






425 














Tyr 


Tyr 


Pro 


Ala 


Ala 


Gly 


Pro Gly Pro Arg 


Arg 


Ser 


Ala 


Cys 


Ala 


Leu 


435 








440 






445 








Gly 


Gly 


Ser 


Leu 


Gly 


Ser 


Leu Pro Ala Ser 


Pro 


Phe 


Arg 


He 


Pro 


Glu 


450 








455 




460 










Aflp 


Asp Glu 


Tyr 


Glu 


Thr 


Thr Gin Glu Cya 


Ala 


Pro 


Pro 


Pro 


Pro 


Pro 


465 








470 




475 










480 


Arg 


Pro 


Arg 


Thr 


Arg 


Gly 


Ala Ser Arg Arg 


Thr 


Ser 


Ala 


Gly 


Pro 


Arg 






485 




490 










495 




Arg 


Trp Arg Arg 


Ser 


Arg 


Leu Asn Gly Leu 


Ala 


Ala 


Gin 


Arg 


Ala 


Arg 






500 






505 








510 






Ala 


Ala 


Arg Asp 


Ser 


Leu 


Ser Leu ser ser 


Gly 


Ser 


Gly 


Cya 


Gly 


Ser 






515 








520 






525 






Ala 


Ala 


Ser 


Ala 


Ser 


Asp 


Asp Asp Ala Aep Asp 


Ala 


Aap 


Gly 


Ala 


Leu 




530 










535 




540 










Ala 


Glu 


Ser 


Thr 


Pro 


Phe 


Leu Gly Leu Arg 


Ala 


Ala 


Hia 


Aap 


Ala 


Leu 


545 










550 




555 










560 


Arg 


Ser 


Aep 


Ser 


Pro 


Pro 


Leu Cya Pro Ala 


Ala 


Asp 


Ser 


Arg 


Thr 


Tyr 






565 




570 










575 




Tyr 


Ser 


Leu 


Asp 


Ser 


His 


Ser Thr Arg Ala 


ser 


Ser 


Arg 


His 


Ser 


Arg 






580 






585 








590 






Gly 


Pro 


Pro 


Thr 


Arg 


Ala 


Lya Gin Asp Ser 


Gly 


Pro 


Leu 










595 






600 






605 









(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 1607 base pairs 
(B> TYPES nucleic acid 

(C) STRANDEDNESS : a ingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE: 

(A) NAME /KEY: Coding Sequence 

(B) LOCATION: 79... 621 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

CCTAACGGCA AAAACATCAA GAAAGAGGTG GGCAAGATCC TGTGCACTGA CTGCGCCACC 60 
CGGCCCAAGC TGAAGAAG ATG AAG AGC CAG ACA GGA GAG GTG GGT GAG AAO 111 
Met Lys Ser Gin Thr Gly Glu Val Gly Glu Lya 
15 10 
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CAO TCQ CTC AAG TOT GAG GCA GCG GCG GGA AAC CCC CAG CCC TCC TAT 
Gin Ser Leu Lys Cys Clu Ala Ala Ala Gly Asn Pro Gin Pro Ser Tyr 
15 20 25 



159 



CGC TGG TTC AAO GAT GGC AAG GAA CTC AAC COG AGT CGT GAT ATT CGC 
Arg Trp Phe Lya Asp Gly Lya Glu Leu Asn Arg Ser Arg Asp lie Arg 
30 35 40 



207 



ATC AAG TAT GGC AAT GTC ACA AAG AAC TCA CGG CTA CAG TTC AAC AAA 
lie Lye Tyr Gly Aan Val Arg Lya Aan Ser Arg Leu Gin Phe Asn Lye 
45 50 55 



255 



GTG AOG GTG GAG GAT GCC GGG GAG TAC GTC TOT GAG OCC GAG AAC ATC 
Val Arg Val Glu Asp Ala Gly Glu Tyr Val Cye Glu Ala Glu Aen lie 
60 65 70 75 



303 



CTT GGG AAG GAC ACC GTG AGG GGC CGA CTC CAT GTC AAC AGC GTG AGC 
Leu Gly Lya Asp Thr Val Arg Gly Arg Leu Hie Val Aan Ser Val Ser 
80 85 90 



351 



ACC ACT CTO TCA TCC TGG TCG GGA CAT GCC CGG AAG TCC AAT GAG ACC 
Thr Thr Leu Ser Ser Trp ser Gly Hia Ala Arg Lya Cya Aan Glu Thr 
95 100 105 



399 



GCC AAG TCC TAC TGT GTG AAT CGA GGC GTG TGC TAC TAC ATC GAC GGC 
Ala Lya Ser Tyr Cya Val Aan Gly Gly Val Cya Tyr Tyr He Glu Gly 
110 115 120 



447 



ATC AAC CAG CTC TCC TCC AAA TGT CCA AAC GGA TTC TTC GGA CAG AGA 
lie Aan Gin Leu Ser Cya Lya Cya Pro Aen Gly Phe Phe Gly Gin Arg 
125 130 135 



495 



TGT TTG GAG AAA CTG CCT TTG CGA TTG TAC ATG CCA GAT CCT AAG CAA 
Cya Leu Glu Lya Leu Pro Leu Arg Leu Tyr Met Pro Aap Pro Lya Gin 
140 145 150 155 



543 



AOT GTC CTG TOO GAT ACA CCO GGG ACA OGT GTC AGC AGT TCG CAA TOG 
Ser Val Leu Trp Aap Thr Pro Gly Thr Gly Val Ser Ser Ser Gin Trp 
160 165 170 



591 



TCA ACT TCT CCA AGC ACC TTG GAT TTG AAT TGAAGGAGGC TGAGGAGCTG TAC 
Ser Thr Ser Pro Ser Thr Leu Aap Leu Aan 
175 180 



644 



CAGAAGAGAQ 
TCTGTGGTCG 
CACAACATGT 
CTOGACCCTG 
CACGTGATCC 
CACCACTGCT 
CTOGAACGTT 
GGCACCAGCA 
TACAGCCAGG 
CTO CO TO ACT 
TOGCCCOTOG 
CCCAACTCTO 
GAGCAGCAGC 
GGAGCGGACA 
CGGCGCAGCG 
ATCCCGGACG 
CCO 



TGCTGACAAT 
CCTACTOCAA 
GCCCAGCCCA 
AGGAGATCCA 
GGAGGGAAGC 
CCACAGCCAC 
CAGAGAGCCT 
AGTGCAACAG 
AGGAGCGGCG 
CTCCACACAG 
ACTTCCACTA 
CGCATGCCGT 
COCTCCTGCG 
TGCAGCGCAG 
CCTGCGCCCT 
ACGACGAGTA 



TACTGGTATC 
GACCAAAAAA 
CCAGAACCGA 
GATGGCAGAT 
TGAGACCACG 
GCCCACCTCC 
GACCTCGGAT 
CCCACCATGT 
CAGGGCTGCC 
TGAAAGGTAC 
CTCGCTGGCC 
GTCGCTGCCG 
OCATCCAGCG 
CTACGACAGC 
GGGAGGCAGC 
CGAGACCACG 



TCTGTGGCCC 
CAGAGGAGGC 
ACCCTGGCCA 
TACATCTCCA 
TTCTCTGGGA 
AGCCACAGAC 
TCCCAGTCAG 
GTGGAGGCAC 
ATGCCACCCT 
GTGTCAGCCT 
ACGCAGGTGC 
CCCGCCGCGC 
CCOCCCGGCC 
TACTACTACC 
TTOGGCAGCC 
CAGGAGTGCG 



TGCTGGTCGT 
AGATGCATCA 
ACGGGCCCAG 
AAAATGTGCC 
GCCACTCCTG 
ATGAGAGCCA 
GCATCATGCT 
GGGCGCGGA6 
ACCATGACTC 
TGACCACGCC 
CGACTTTCGA 
CCATCAGCTA 
CGGGGCCGGG 
CTGCGGCGGG 
TGCCCGCCAG 
CGCCCCCGCC 



GGCCATCOTC 
TCATCTCCGG 
CCACCCTCGG 
AGCTACAGAC 
TTCACCTTCT 
CACGTGCAGC 
ATCATCAOTA 
GGCAGCACCC 
CATAGACTCG 
CGCTCCCCTC 
GATCACGTCC 
CCGCCTGGCG 
GTCGGGGCCC 
GCCCCOGCCG 
CCCCTTCCGC 
GCCGCCGCGG 



704 
764 
824 
884 
944 
1004 
1064 
1124 
1184 
1244 
1304 
1364 
1424 
1484 
1544 
1604 
1607 



(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH t 181 amino acida 
(B) TYPE: amino acid 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: protein 
(V) FRAGMENT TYPE: internal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



Mttt 


Lya 


Ser 


Gin 


Thr 


Gly 


Glu 


Val 


Gly 


Glu 


Lya 


Gin 


Ser 


Leu 


Lye Cya 


1 






5 






10 










15 


Glu 


Ala 


Ala 


Ala 


Gly 


Aan 


Pro 


Gin 


Pro 


Ser 


Tyr 


Arg 


Trp 


Phe 


Lya Aap 








20 








25 










30 




Gly 


Lya 


Glu 


Leu 


Aan 


Arg 


Ser 


Arg 


Aap 


lie 


Arg 


lie 


Lya 


Tyr Gly Aan 


35 










40 










45 






Val 


Arg 


Lya 


Aan 


Ser 


Arg 


Leu 


Gin 


Phe 


Aan 


Lya 


Val 


Arg 


Val 


Glu Aap 




50 






55 










60 








AlA 


Gly 


Glu 


Tyr 


val 


Cya 


Glu 


Ala 


Glu 


Aan 


lie 


Leu 


Gly 


Lya 


Aap Thr 


65 






70 










75 








80 


Val 


Arg 


Gly 


Arg 


Leu 


Hia 


Val 


Aan 


Ser 


Val 


Ser 


Thr 


Thr 


Leu 


Ser Ser 




85 










90 










95 


Trp 


Ser 


Gly 


Hie 


Ala 


Arg 


Lya 


Cya 


Aan 


Glu 


Thr 


Ala 


Lya 


Ser 


Tyr Cya 




100 










105 










110 




Val 


Aan 


Gly 


Gly 


val 


Cya 


Tyr 


Tyr 


lie 


Glu 


Gly 


He 


Aan 


Gin 


Leu Ser 






115 








120 










125 






Cys 


Lya 


Cya 


Pro 


Aan 


Gly 


Phe 


Phe 


Gly 


Gin 


Arg 


Cys 


Leu 


Glu 


Lya Leu 


130 








135 










140 








Pro 


Lttu 


Arg 


Leu 


Tyr 


Met 


Pro 


Aap 


Pro 


Lya 


Gin 


Ser 


Val 


Leu 


Trp Aap 


145 






150 










155 








160 


Thr 


Pro 


Gly 


Thr 


Gly 


Val 


Ser 


Ser 


Ser 


Gin 


Trp 


Ser 


Thr 


Ser 


Pro Ser 








165 










170 










175 


Thr 


Leu 


Aap 


heu 
IfiO 


Aan 























(2) INFORMATION FOR SEQ ID NOt5i 

(i> SEQUENCE CHARACTERISTICS: 
(A) UNGTH: 1884 baae pairs 
<B) TYPE: nucleic acid 

(C) STRANOEDNESS: aingle 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 

(ix) FEATURE! 

(A) NAMK/KEY: Coding Sequence 

(B) LOCATION: 664... 1883 

(xi) SEQUENCE DESCRIPTION! SEQ ID NO: 5: 

CAGCTACAGC CACAOCAGCA GCAGCACCAG CGAGAGGAGC AGCAGCAGCA GCAGCAGCAG 60 

CAGCGAGAGC CCCAOCAGCA GCAGGAGCAG CAGCAACAAC ACCAGCATCT CTCCTCCCGC 120 

TOCOCCCCCA GAGCCOCGGC CGCAGCAACA GCCGCAGCCC CGCAGCCCCG CAGCCCGGAG 180 

AGCOGCCCCC CGTTCGCGAG CCGCAGCCGC CGGCGGCATG AGGCGCGACC CGGCCCCCGG 240 

CTTCTCCATG CTCCTCTTCG GTGTGTCGCT CCCCTGCTAC TCGCCCAGCC TCAAGTCAGT 300 

GCAGCACCAG CCCTACAAGG CACCCCTGGT GGTGGAGGGC AAGGTACAGG GGCTGGTCCC 360 

AOCCGGCGGC TCCAGCTCCA ACAGCACCCG AGAGCCGCCC GCCTCGGGTC GGCTGGCGTT 420 

GGTAAAGGTG CTGGACAAGT GGCCGCTCCO GAGCGGCGCG CTGCAGCGCG AGCAGGTGAT 480 

CACCGTGGGC TCCT G T G TGC CGCTCGAAAG GAACCAGCGC TACATCTTTT TCCTGGAGCC 540 

CACGCAACAG CCCTTAGTCT TTAAGACGGC CTTTGCCCCC CTOATACCAA CGGCAAAAAT 600 

CTCAAGAAAO AGGTGGGCAA GATCCTGTGC ACTGGCTGCG CCACCCCGCC CAAGTTGAAG 660 

AAC ATC AAG AGC CAG ACG GGA CAG GTG GGT GAG AAC CAA TCG CTG AAG 708 
Met Lya Ser Gin Thr Gly Gin Val Gly Glu Lya Gin Ser Leu Lye 

15 10 15 

TGT GAG CCA CCA GCC GGT AAT CCC CAG CCT TCC TAC CGT TGC TTC AAO 7 56 

Cya Glu Ala Ala Ala Gly Aan Pro Gin Pro Ser Tyr Arg Trp Phe Lya 

20 25 30 
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CAT GCC AAO OAO CTC AAC CGC AGC CGA GAC ATT CGC ATC AAA TAT GGC 804 
Asp CXy Lys Glu Leu Aen Arg Ser Arg Asp lie Arg lie Lys Tyr Gly 
35 40 45 

AAC CGC AGA AAG AAC TCA CGA CTA GAG TTC AAC AAG GTG AAG GTG GAG 852 
Aan Gly Arg Lye Aen Sar Arg Leu Gin Pha Aan Lys Val Lys Val Glu 
50 55 60 

GAC GCT GGG GAG TAT GTC TGC GAG GCC GAG AAC ATC CTC GGG AAG GAC 900 
Aap Ala Gly Glu Tyr Val Cya Glu Ala Glu Asn He Leu Gly Lye Aap 
65 70 75 ^ 

ACC GTC CCG GGC CCG CTT TAC GTC AAC AGC GTG AGC ACC ACC CTG TCA 948 
Thr Val Arg Gly Arg Leu Tyr Val Asn Ser Val Ser Thr Thr Leu Ser 
flO 85 90 95 

TCC TOO TCC GGC CAC GCC CGO AAG TGC AAC GAG ACA GCC AAG TCC TAT 996 
Ser Trp Ser Gly His Ala Arg Lye Cya Aen Glu Thr Ala Lye Ser Tyr 
100 105 110 

TGC GTC AAT GGA GGC GTC TGC TAC TAC ATC GAG GGC ATC AAC CAG CTC 1044 
Cya Val Aan Gly Gly Val Cya Tyr Tyr He Glu Gly lie Asn Gin Leu 
115 120 125 

TCC TGC AAA TGT CCA AAT GGA TTC TTC GGA CAG AGA TCT TTG GAG AAA 1092 
Ser Cya Lya Cya Pro Ann Gly Phe Pha Gly Gin Arg Cye Leu Glu Lva 
130 135 140 

CTG CCT TTG CGA TTG TAC ATG CCA GAT CCT AAG CAA AAG CAC CTT GGA 1140 
Leu Pro Leu Arg Leu Tyr Met Pro Aap Pro Lys Gin Lya Hie Leu Gly 
145 150 155 

TTT GAA TTA AAG GAA GCC GAG GAC CTG TAC CAG AAG AGG GTC CTG ACC 1188 
Phe Glu Leu Lya Glu Ala Glu Glu Leu Tyr Gin Lya Arg Val Leu Thr 
160 165 170 175 

ATC ACG GGC ATC TGC GTG GCT CTG CTG GTC GTG GCC ATC GTC TGT GTG 1236 
lie Thr Gly He Cya Val Ala Leu Leu Val Val Gly He Val Cya Val 
180 185 190 

GTG GCC TAC TGC AAG ACC AAA AAA CAG CGG AAG CAG ATG CAC AAC CAC 1284 
Val Ala Tyr Cya Lya Thr Lya Lya Gin Arg Lya Gin Met His Asn His 
195 200 205 

CTC CCG CAC AAC ATG TGC CCG GCC CAT CAG AAC CGG AGC TTG GCC AAT 1332 
Leu Arg Gin Aan Met Cya Pro Ala His Gin Aan Arg Ser Leu Ala Aan 
210 215 220 



GGG CCC AGC CAC CCC CGG CTG GAC CCA GAG GAG ATC CAO ATG CCA GAT 
Gly Pro Ser Hla Pro Arg Leu Aap Pro Glu Glu He Gin Met Ala Aap 
225 230 235 



TAT ATT TCC AAG AAC GTG CCA GCC ACA GAC CAT GTC ATC AGG AGA GAA 
Tyr He Ser Lys Aan Val Pro Ala Thr Aap His Val He Arg Arg Glu 
240 245 250 255 



ACT GAG ACC ACC TTC TCT GGG AGC CAC TCC TGT TCT CCT TCT CAC CAC 

Thr Glu Thr Thr Phe Ser Gly Ser His Ser Cya Ser Pro Ser His Hla 

260 265 270 

TGC TCC ACA GCC ACA CCC ACC TCC AGC CAC AGA CAC GAG AGC CAC ACG 

Cya Ser Thr Ala Thr Pro Thr Ser Ser His Arg Hla Glu Ser Hia Thr 
275 280 285 

TGG AGC CTG GAA CCT TCT GAG AGC CTG ACT TCT GAC TCC CAG TCG GGG 

Trp Ser Leu Glu Arg Ser Glu Ser Leu Thr Ser Aap Ser Gin Ser Gly 
290 295 300 



1380 



1428 



1476 



1524 



1572 
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ATC ATG CTA TCA TCA CTG 
Il« Met Leu Ser Ser Val 
305 

GTG GAO GCC CGG GCA ACG 
Val Glu Ala Arg Ala Arg 
320 325 

CGC AGG GCC ACC GCG CCA 

Arg Arg Ala Thr Ala Pro 
340 

GAG TOO CCA CAC AGC GAG 
Asp Ser Pro Hia Ser Glu 
355 

CGC CTC TCG CCC GTG GAC 
Arg Leu Ser Pro Val Asp 
370 

ACT TTC GAG ATC ACG TCC 
Thr Phe Glu He Thr Ser 
385 

CCG GCG GCG CCC ATC ACT 
Pro Ala Ala Pro He Ser 
400 405 



-81- 

GGT ACC AGC AAA TGC AAC 
Gly Thr Ser Lya Cye Asn 
310 315 

CGG GCA GCA GCC TAC AAC 

Arg Ala Ala Ala Tyr Asn 

330 

CCC TAT CAC CAT TCC GTG 

Pro Tyr Hla Asp Ser Val 
345 

AGO TAC GTG TCG GCC CTG 

Arg Tyr Val Ser Ala Leu 
360 

TTC CAC TAC TCG CTG GCC 
Phe His Tyr Ser Leu Ala 
375 

CCC AAC TCG CCG CAC GCC 
Pro Asn Ser Ala His Ala 
390 395 

TAC CGC 
Tyr Arg 



AGC CCA GCA TGT 1620 
Ser Pro Ala Cya 



CTG GAG GAG CGG 1668 
Leu Glu Glu Arg 
335 

GAC TCC CTT CGC 1716 
Asp Ser Leu Arg 
350 

ACC ACG CCC GCG 1764 
Thr Thr Pro Ala 
365 

ACG CAG GTG CCA 1812 

Thr Gin Val Pro 

380 

GTG TCG CTG CCG 1860 
Val Ser Leu Fro 



1884 



(2) INFORMATION FOR SEQ ID NO: 6: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 407 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(V) FRAGMENT TYPE: internal 



<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 



Met 


Lys 


Ser 


Gin 


Thr 


Gly 


Gin 


Val 


Gly 


Glu 


Lys 


Gin 


Ser 


Leu 


Lys 


Cye 


1 








5 










10 










15 




Glu 


Ala 


Ala 


Ala 
20 


Gly 


Asn 


Pro 


Gin 


Pro 
25 


Ser 


Tyr 


Arg 


Trp 


Phe 
30 


Lys 


Asp 


Gly 


Lys 


Glu 


Leu 


Asn 


Arg 


Ser 


Arg 


Asp 


He 


Arg 


He 


Lys 


Tyr 


Gly Asn 






35 










40 










45 








Gly Arg 


Lys 


Asn 


Ser 


Arg 


Leu 


Gin 


Phe 


Asn 


Lys 


Val 


Lys 


Val 


Glu 


Asp 




50 










55 










60 










Ala Gly 


Glu 


Tyr 


Val 


Cys 


Glu 


Ala 


Glu 


Asn 


He 


Leu 


Gly 


Lys 


Asp 


Thr 


65 










70 










75 










80 


Val 


Arg 


Gly 


Arg 


Leu 
85 


Tyr 


Val 


Asn 


Ser 


Val 
90 


Ser 


Thr 


Thr 


Leu 


Ser 

95 


Ser 


Trp 


Ser 


Gly 


His 
100 


Ala 


Arg 


Lya 


Cya 


Asn 

105 


Glu 


Thr 


Ala 


Lys 


ser 
110 


Tyr 


Cys 


Val 


Asn 


Gly 


Gly 


val 


Cys 


Tyr 


Tyr 


He 


Glu 


Gly 


He 


Asn 


Gin 


Leu 


Ser 






115 








120 










125 








Cys 


Lye 

130 


Cys 


Pro 


Asn 


Gly 


Phe 

135 


Phe 


Gly 


Gin 


Arg 


Cys 
140 


Leu 


Glu 


Lys 


Leu 


Pro 


Leu 


Arg 


Leu 


Tyr 


Met 


Pro 


Asp 


Pro 


Lys 


Gin 


Lys 


HiB 


Leu 


Gly 


Phe 


145 








150 






155 










160 


Glu 


Leu 


Lys 


Glu 


Ala 

165 


Glu 


Glu 


Leu 


Tyr 


Gin 
170 


Lys 


Arg 


Val 


Leu 


Thr 

175 


He 


Thr Gly 


He 


Cys 


Val 


Ala 


Leu 


Leu 


Val 


Val 


Gly 


He 


Val 


Cye 


Val 


Val 








180 










185 










190 






Ala 


Tyr 


Cys 


Lye 


Thr 


Lys 


Lys 


Gin 


Arg 


Lys 


Gin 


Met 


His 


Asn 


His 


Leu 



195 200 205 
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Arg 


Gin 


Asn 


Mat 


Cys Pro Ala 


His 


Gin 


Asn 


Arg 


Ser 


Leu 


Ala 


Aan 


Gly 




210 






215 








220 








Pro 


Ser 


KlB 


Pro 


Arg Leu Aap 


Pro 


Glu 


Glu 


He 


Gin 


Met 


Ala 


Asp 


xyr 


225 








230 








235 








240 


Il« 


Sar 


Lya 


Aan 


Val Pro Ala 


Thr 


Asp 


Hie 


Val 


He 


Arg 


Arg 


Glu 


Thr 










245 






250 








255 




Glu 


Thr 


Thr 


Phe Ser Gly Ser 


HiB 


Ser 


Cya 


Ser 


Pro 


Ser 


His 


His 


Cys 








260 






265 










270 




S«r 


Thr 


Ala 


Thr 


Pro Thr Ser 


Ser 


Hie 


Arg 


Hia 


Glu 


Ser 


His 


Thr 


Trp 






275 






280 








285 






St 


L«u 


Olu 


Arg 


Ser Glu Ser 


Leu 


Thr 


Ser 


Asp 


Ser 


Oln 


Ser 


Oly 


He 




290 






295 








300 








Hot 


L«u 


Ser 


Ser 


Val Gly Thr 


Ser 


Lya 


Cya 


Asn 


Ser 


Pro 


Ala 


Cys 


Val 


305 








310 






315 








320 


Glu 


Ala 


Arg 


Ala Arg Arg Ala Ala Ala 


Tyr 


Asn 


Leu 


Glu 


Glu 


Arg 


Arg 










325 






330 










335 


Arg 


Ala 


Thr 


Ala 


Pro Pro Tyr 


HiB 


ABp 


Ser 


Val 


Asp 


Ser 


Leu 


Arg 


Asp 








340 






345 










350 




Ser 


Pro 


His 


Ser 


Glu Arg Tyr Val 


Ser 


Ala 


Leu 


Thr 


Thr 


Pro 


Ala 


Arg 






355 






360 










365 






Leu 


Ser 


Pro 


Val 


Asp Phe HiB 


Tyr 


Ser 


Leu 


Ala 


Thr 


Gin 


Val 


Pro 


Thr 




370 






375 








380 










Ph« 


Glu 


He 


Thr 


Ser Pro Abh 


Ser 


Ala 


His 


Ala 


Val 


Ser 


Leu 


Pro 


Pro 


385 








390 








395 










400 


Ala 


Ala 


Pro 


He 


Ser Tyr Arg 





















405 

(2) INFORMATION FOR SEQ ID NO: 7; 

(i) SEQUENCE CEIARACTERISTICS : 

(A) LENGTH: 1476 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(ix) FEATURE: 

<A) NAME/KEY t Coding Sequence 
(B) LOCATION! 69... 1475 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 

CGOCCCGCGG GGGCGCAGCG CGGCAGCGGA GAGCTGAGGC CGTCCCACCG CCTGGGACCC 60 

CGTGCAGA ATG TCG GAG TCC AGG AGG AGG GGC CGC GGC CGC GGC AAG AAG 110 
Het Ser Glu Ser Arg Arg Arg Gly Arg Gly Arg Gly Lys Lys 
15 10 

CAC CCA GAG GGG AGG AAG CGG GAG AGG GAG CCC GAT CCC GGG GAG AAA ISB 
His Pro Glu Gly Arg Lys Arg Glu Arg Glu Pro Asp Pro Gly Glu Lys 
15 20 25 30 

GCC ACC CGG CCC AAG TTG AAG AAG ATG AAG AGC CAG AGG GGA GAG GTG 206 
Ala Thr Arg Pro Lys Leu Lya Lys Met Lya Ser Gin Thr Gly Gin Val 
35 40 45 

OCT GAG AAG CAA TCG CTG AAG TGT GAG GCA GCA GCC GGT AAT CCC CAG 254 
Oly Glu Lys Gin Ser Leu Lys Cys Glu Ala Ala Ala Gly Aan Pro Gin 
50 5 5 60 

CCT TCC TAC COT TGO TTC AAG GAT GGC AAG GAG CTC AAC CGC AGC CGA 302 
Pro Ser Tyr Arg Trp Phe Lys Aap Gly Lys Glu Leu Asn Arg Ser Arg 
65 70 75 

GAC ATT COC ATC AAA TAT GGC AAC GGC AG A AAG AAC TCA CGA CTA CAG 350 
Asp He Arg He Lys Tyr Gly Asn Gly Arg Lys Asn Ser Arg Leu Gin 

80 85 90 
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TTC AAC AAG GTG AAG GTG GAG GAC GCT GGG GAG TAT GTC TCC GAG GCC 398 
Pha Asn Lys Val Lye Val Glu Asp Ala Gly Glu Tyr Val Cys Glu Ala 
95 100 105 110 

GAG AAC ATC CTG GGG AAG GAC ACC GTC CGG GGC CGG CTT TAG GTC AAC 446 
Glu Aan lie Leu Gly Lye Aep Thr Val Arg Gly Arg Leu Tyr Val Aan 
115 120 125 

AGC GTC AGC ACC ACC CTG TCA TCC TGG TCC GGG CAC GCC CGG AAG TGC 494 
Ser Val Ser Thr Thr Leu Ser Ser Trp Ser Gly Hie Ala Arg Lya eye 
130 135 140 

AAC GAC ACA GCC AAG TCC TAT TGC GTC AAT GGA GGC GTC TGC TAC TAC 542 
Asn Glu Thr Ala Lye Ser Tyr Cya Val Asn Gly Gly Val Cys Tyr Tyr 
145 150 155 

ATC GAC GGC ATC AAC CAG CTC TCC TGC AAA TGT CCA AAT GGA TTC TTC 590 
He Glu Gly He Aen Gin Leu Ser Cys Lys Cya Pro Aan Gly Phe Phe 
160 165 170 

GGA CAG AGA TGT TTG GAG AAA CTC CCT TTG CCA TTC TAC ATG CCA GAT 636 
Gly Gin Arg Cye Lou Glu Lys Leu Pro Leu Arg Leu Tyr Met Pro Asp 
175 180 185 190 

CCT AAG CAA AAA GCC GAG GAG CTG TAC CAG AAG AGG GTC CTG ACC ATC 686 
Pro Lys Gin Lys Ala Glu Glu Leu Tyr Gin Lys Arg Val Leu Thr He 
195 200 205 

ACC GCC ATC TGC GTG GCT CTG CTG GTC GTG GGC ATC GTC TGT GTG CTG 734 
Thr Gly He Cya Val Ala Leu Leu Val Val Gly He Val Cys Val Val 
210 215 220 

GCC TAC TGC AAG ACC AAA AAA CAG CGG AAG CAG ATC CAC AAC CAC CTC 782 
Ala Tyr Cyo Lyo Thr Lys Lys Gin Arg Lys Gin Met His Asn His Leu 
225 230 235 

CGG CAG AAC ATG TGC CCG GCC CAT CAG AAC CGG AGC TTG GCC AAT CGG 830 
Arg Gin Asn Met Cys Pro Ala Hia Gin Asn Arg Ser Leu Ala Asn Gly 
240 245 250 

CCC AGC CAC CCC CGG CTC CAC CCA GAG GAG ATC CAG ATG GCA GAT TAT 878 
Pro Ser His Pro Arg Leu Asp Pro Glu Glu He Gin Met Ala Asp Tyr 
255 260 265 270 

ATT TCC AAG AAC CTG CCA GCC ACA GAC CAT GTC ATC AGG AGA CAA ACT 926 
He Ser Lys Asn Val Pro Ala Thr Aap His Val He Arg Arg Glu Thr 
275 280 285 

GAC ACC ACC TTC TCT GGG AGC CAC TCC TGT TOT CCT TCT CAC CAC TCC 974 
Glu Thr Thr Phe Ser Gly Ser His Ser Cys Ser Pro Ser His His Cys 
290 295 300 

TCC ACA GCC ACA CCC ACC TCC AGC CAC AGA CAC GAG AGC CAC ACG TGG 1022 
Ser Thr Ala Thr Pro Thr Ser Ser His Arg His Glu Ser His Thr Trp 
305 310 315 

AGC CTG GAA CGT TCT GAG AGC CTG ACT TCT GAC TCC CAG TCC GGG ATC 1070 
Ser Leu Glu Arg Ser Glu Ser Leu Thr Ser Asp Ser Gin Ser Gly He 
320 325 330 

ATG CTA TCA TCA GTG GOT ACC AGC AAA TGC AAC AGC CCA GCA TGT GTG 1118 
Met Leu Ser Ser Val Gly Thr Ser Lys Cys Asn Ser Pro Ala Cys Val 
335 340 345 350 

CAC GCC CCC CCA AGC CGG GCA GCA GCC TAC AAC CTG GAG GAG CGG CCC 1166 
Glu Ala Arg Ala Arg Arg Ala Ala Ala Tyr Asn Leu Glu Glu Arg Arg 

355 360 365 
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AGG OCC ACC GCG CCA CCC TAT CAC GAT TCC GTG GAC TCC CTT CGC CAC 1214 
Arq Ala Thr Ala Pro Pro Tyr Hi« Aap Ser Val Asp Ser Leu Arg Asp 

370 375 380 

TCC CCA CAC AGC GAG AGG TAG GTG TCG GCC CTG ACC ACG CCC GCG CGC 1262 
S«r Pro His Ser Glu Arg Tyr Val ser Ala Leu Thr Thr Pro Ala Arg 
385 390 395 

CTC TCC CCC GTG GAC TTC CAC TAC TCG CTG GCC ACC CAG GTG CCA ACT 1310 
L«u Ser Pro Val Asp Pho Hie Tyr Ser Leu Ala Thr Gin Val Pro Thr 
400 405 410 

TTC GAG ATC ACG TCC CCC AAC TCG GCG CAC GCC GTG TCG CTG CCG CCG 1358 
Phe Glu lie Thr Ser Pro Aan Ser Ala His Ala Val Ser Leu Pro Pro 
415 420 425 430 

GCG GCG CCC ATC AGT TAC CGC CTG GCC GAG CAG CAC CCG TTA CTC CCC 1406 
Ala Ala Pro lie Ser Tyr Arg Leu Ala Glu Gin Gin Pro I^u Leu Arg 
435 440 445 

CAC CCG GCG CCC CCC GGC CCG GGA CCC GGA CCC GGG CCC GGG CCC GGG 1454 
Hie Pro Ala Pro Pro Gly Pro Gly Pro Gly Pro Gly Pro Gly Pro Gly 
450 455 460 

CCC GGC GCA GAC ACC GGA ATT C 1476 
Pro Gly Ala Asp Thr Gly He 
465 



(2) INrORMATION FOR SEQ ID NOi8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH! 469 amino acids 

(B) TYPES amino acid 
(D) TOPOLOGY; linear 

(ii) MOLECULE TYPE; protein 
(V) FRAGMENT TYPE: internal 



(xi) SEQUENCE DESCRIPTION i SEQ ID NO: 6: 



Met 


Ser 


Glu 


Ser 


Arg 


Arg 


Arg 


Gly 


Arg 


Gly 


Arg 


Gly 


Lys 


Lys 


His 


Pro 


1 








5 










10 










15 




Glu 


Gly 


Arg 


Lya 


Arg 


Glu 


Arg 


Glu 


Pro 


Asp 


Pro 


Gly Glu 


Lys 


Ala 


Thr 








20 










25 










30 






Arg 


Pro 


Lya 

35 


Lciu 


Lys 


Lys 


Met 


Lys 
40 


Ser 


Gin 


Thr 


Gly 


Gin 
45 


Val 


Gly 


Glu 


Lye 


Gin 


Ser 


Leu 


Lys 


Cye 


Glu 


Ala 


Ala 


Ala 


Gly 


Asn 


Pro 


Gin 


Pro 


Ser 




50 






55 










60 










Tyr 


Arg 


Trp 


Phe 


Lye 


Asp 


Gly 


Lys 


Glu 


Leu 


Aan 


Arg 


ser 


Arg 


Asp 


He 


65 










70 










75 










80 


Arg 


He 


Lye 


Tyr 


Gly 
85 


Aan 


Gly 


Arg 


Lys 


Asn 
90 


Ser 


Arg 


Leu 


Gin 


Phe 

95 


Aan 


Lye 


Val 


Lya 


Val 
100 


Glu 


Asp 


Ala 


Gly 


Glu 
105 


Tyr 


Val 


Cys 


Glu 


Ala 

110 


Glu 


Asn 


He 


Leu 


Gly 
115 


Lys 


Asp 


Thr 


Val 


Arg 
120 


Gly 


Arg 


Leu 


Tyr 


Val 
125 


Asn 


Ser 


Val 


Ser 


Thr 
130 


Thr 


Leu 


Ser 


Ser 


Trp 
135 


Ser 


Gly 


His 


Ala 


Arg 
140 


Lys 


Cys 


Aen 


Glu 


Thr 


Ala 


Lye 


Ser 


Tyr 


Cye 


Val 


Asn 


Gly 


Gly 


Val 


Cys 


Tyr 


Tyr 


He 


Glu 


145 










150 










155 










160 


Gly 


He 


Aen 


Gin 


Leu 
165 


Ser 


Cys 


Lys 


Cya 


Pro 
170 


Aan 


Gly 


Phe 


Phe 


Cly 
175 


Gin 


Arg 


Cye 


Leu 


Glu 


Lys 


Leu 


Pro 


Leu 


Arg 


Leu 


Tyr 


Met 


Pro 


Asp 


Pro 


Lye 








180 










185 








190 






Gin 


Lya 


Ala 


Glu 


Glu 


Leu 


Tyr 


Gin 


Lys 


Arg 


Val 


Leu 


Thr 


He 


Thr Gly 



195 200 205 
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lie 


Cys 


val 


Ala 


Leu 


Leu 


Val 


Val 


Gly 


He 


Val 


Cys 


Val 


Val 


Ala 


Tyr 




210 










215 










220 










CyB 




Thr 


LyB 


Lys 


Gin 


Arg 


Lya 


Gin 


Met 


His 


Aan 


His 


Leu 


Arg 


Gin 


225 










230 










235 










240 


ABn 


Met 


Cye 


Pro 


Ala 


His 


Gin 


Asn 


Arg 


Ser 


Leu 


Ala 


Asn 


Gly 


Pro 


Ser 








245 










250 










255 




His 


Pro Arg 


Leu 


ABp 


Pro 


Glu 


Glu 


He 


Gin 


Met 


Ala 


Asp Tyr 


He 


Ser 








260 










265 










270 






Lye 


Aba 


Val 


Pro 


Ala 


Thr 


ABP 


His 


Val 


He 


Arg 


Arg 


Glu 


Thr 


Glu 


Thr 




275 








260 










265 








Thr 


Phe 


Ser 


Gly 


Ser 


KiB 


Ser 


Cys 


Ser 


Pro 


Ser 


Hia 


His 


Cys 


Ser 


Thr 




290 








295 










300 










Ala 


Thr 


Pro 


Thr 


Ser 


Ser 


Hia 


Arg 


His 


Glu 


Ser 


His 


Thr 


Trp 


Ser 


Leu 


305 










310 








315 










320 


Glu 


Arg 


Ser 


Glu 


Ser 


Leu 


Thr 


Ser 


Asp 


Ser 


Gin 


Ser Gly 


Ha 


Mat 


Lau 








325 








330 










335 




Sar 


Ser 


val 


Gly 


Thr 


Ser 


LyB 


Cys 


Asn 


Ser 


Pro 


Ala 


Cys 


Val 


Glu 


Ala 








340 






345 










350 






Arg Ala Arg 


Arg Ala Ala 


Ala 


Tyr Aan Leu 


Glu 


Glu 


Arg 


Arg 


Arg 


Ala 






355 










360 










365 








Thr 


Ala 


Pro 


Pro 


Tyr 


His 


Asp 


Ser 


Val 


Asp 


Ser 


Leu 


Arg 


Aap 


Ser 


Pro 




370 










375 










3B0 










Hia 


Ser 


Glu 


Arg Tyr 


val 


Ser 


Ala 


Leu 


Thr 


Thr 


Pro 


Ala 


Arg 


Leu 


Ser 


385 










390 










395 










400 


Pro 


val 


ABp 


Phe 


His 


Tyr 


Ser 


Leu 


Ala 


Thr 


Gin 


Val 


Pro 


Thr 


Phe 


Glu 








405 








410 










415 




He 


Thr 


Ser 


Pro 


Asn 


Ser 


Ala 


His 


Ala 


Val 


Ser 


Leu 


Pro 


Pro 


Ala 


Ala 








420 










425 










430 






Pro 


He 


Ser 


Tyr 


Arg 


Leu 


Ala 


Glu 


Gin 


Gin 


Pro 


Leu 


Leu 


Arg 


Hia 


Pro 






435 






440 










445 








Ala 


Pro 


Pro 


Gly 


Pro 


Gly 


Pro 


Gly 


Pro 


Gly 


Pro 


Gly 


Pro Gly 


Pro 


Gly 




450 








455 










460 










Ala 


ABp 


Thr 


Gly 


He 
























465 

































(2) INFORMATION FOR SEQ ID N0:9i 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 422 amino acida 
<B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



Met 


Ser 


Glu 


Arg 


Lya 


Glu 


Gly 


Arg 


Gly 


Lya 


Gly 


Lys 


Gly 


Lya 


Lya 


Lys 


1 








5 










10 










15 




Aap Arg 


Gly 


Ser 


Arg 


Gly 


Lys 


Pro 


Gly 


Pro 


Ala 


Glu 


Gly 


Asp 


Pro 


Ser 








20 










25 










30 






Pro 


Ala 


I^u 
35 


Pro 


Pro 


Arg 


Leu 


Lys 
40 


Glu 


Met 


Lys 


Ser 


Gin 
45 


Glu 


Ser 


Ala 


Ala 


Gly 
50 


Sar 


Lys 


Leu 


Val 


Leu 

55 


Arg 


Cys 


Glu 


Thr 


Ser 

60 


Ser 


Glu 


Tyr 


Ser 


Ser 


Leu 


Arg 


Phe 


Lya 


Trp 


Phe 


Lys 


Asn 


Gly 


Aan 


Glu 


Leu 


Asn 


Arg 


Lys 


65 










70 










75 










60 


Aan 


Lya 


Pro 


Glu 


ABn 


He 


Lya 


He 


Gin 


Lya 


Lya 


Pro 


Gly 


Lya 


Ser 


Glu 










85 








90 










95 




Leu 


Arg 


He 


Asn 

100 


Lya 


Ala 


Ser 


Leu 


Ala 

105 


Asp 


Ser 


Gly 


Glu 


Tyr 
110 


Met 


CyB 


LyB 


Val 


He 


Ser 


Lys 


Leu 


Gly 


Asn 


Asp 


ser 


Ala 


Ser 


Ala 


Aan 


He 


Thr 






115 






120 










125 








He 


Val 


Glu 


Ser 


Aan 


Glu 


Phe 


He 


Thr 


Gly 


Met 


Pro 


Ala 


Ser 


Thr 


Glu 




130 










135 








140 










Thr 


Ala 


Xyr 


val 


Ser 


Ser 


Glu 


Ser 


Pro 


Ho 


Arg 


He 


Ser 


Val 


Ser 


Thr 


145 








150 










155 










160 


Glu Gly 


Ala 


Aan 


Thr 


Ser 


Ser 


ser 


Thr 


Ser 


Thr 


Ser 


Thr 


Thr Gly 


Thr 










165 










170 










175 
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s«r 


His 


Leu 


He 
180 


Lye 


Cya 


Ala 


Glu 


Lya 
185 


Glu 


Lya 


Thr 


Phe 


CVB 

190 


Val 


Aan 


Gly Gly 


Glu 


Cya 


Phe 


Thr 


val 


Lya 


Asp 


Leu 


Ser 


Aan 


Pro 


Ser 


Arg 








195 










200 










205 




L«u 


Cya 


Lye 


Cys 


Gin 


Pro Gly 


Phe Thr Gly 


Ala 


Arg Cys 


Thr 


Glu 


Aan 




210 










215 










220 










Val 


Pro 


Met 


Lya 


Val 


Gin 


Thr 


Gin 


Glu 


Lya 


Ala 


Glu 


Glu 


Leu 


Tyr 


Oln 


225 










230 








235 










Ly. 


Arg 


Val 


Leu 


Thr 


He 


Thr Gly 


He 


Cys 


He 


Ala 


Leu 


Leu 


Val 


Val 










245 










250 










255 




Gly 


He 


Met 


Cys 


Val 


Val 


Ala 


Tyr 


Cya 


Lye 


Thr 


Lya 


Lys 


Gin 


Arg 


Gin 








260 










265 










270 




Lya 


Leu 


His 


Asp Arg 


Leu 


Arg Gin 


Ser 


Leu 


Arg 


Ser 


Glu 


Arg 


Ser 


Aan 






275 










280 










285 






Lttu 


VaX 


Aan 


He 


Ala 


Asn Gly 


Pro 


Hia 


Hia 


Pro 


Aan 


Pro 


Pro 


Pro 


Glu 




290 










295 










300 










Aan 


Val 


Gin 


Leu 


val 


Aan 


Gin 


Tyr 


Val 


Ser 


Lya 


Asn 


Val 


He 


Ser 


Ser 


305 










310 










315 










320 


Glu 


Hia 


lie 


Val 


Glu 


Arg Glu 


Val 


Glu 


Thr 


Ser 


Phe 


Ser 


Thr 


Ser 


His 










325 










330 










335 




Tyr Thr 


Ser 


Thr 


Ala 


His 


His 


Ser 


Thr 


Thr 


Val 


Thr 


Gin 


Thr 


Pro 


Ser 








340 










345 










350 






Hia 


Ser 


Trp 
355 


Ser 


Aan 


Gly 


Hia 


Thr 
360 


Glu 


Ser 


Val 


He 


Ser 
365 


Glu 


Ser 


Aan 


Sar 


Val 

370 


lie 


Met 


Mat 


Ser 


Ser 

375 


val 


Glu 


Aan 


Ser 


Arg 
380 


His 


Ser 


Ser 


Pro 


Ala Gly Gly 


Pro Arg 


Gly Arg 


Leu 


Hia Gly 


Leu 


Gly Gly 


Pro Arg 


Aap 


385 










390 










395 










400 


Aan 


Ser 


Phe 


Leu 


Arg 


Hia 


Ala 


Arg 


Glu 


Thr 


Pro 


Asp 


Ser 


Tyr 


Arg Asp 










405 










410 










415 




Ser 


Pro 


His 


Ser 


Glu 


Arg 























420 



<2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH 1 645 amino acids 
{B) TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO; 10; 



Met 


Ser 


Glu 


Arg 


Lys 


Glu 


Gly 


Arg 


Gly 


Lye 


Gly 


Lys 


Gly 


Lys 


Lya 


Lya 


1 








5 










10 










15 




Glu 


Arg 


Gly 


Ser 
20 


Gly 


Lys 


Lys 


Pro 


Glu 
25 


Ser 


Ala 


Ala 


Gly 


Ser 
30 


Gin 


Ser 


Pro 


Ala 


Leu 


Pro 


Pro 


Arg 


Leu 


Lys 


Glu 


Met 


Lys 


Ser 


Gin 


Glu 


Ser 


Ala 






35 








40 








45 








Ala Gly 


Ser 


Lya 


Leu 


Val 


Leu 


Arg 


Cya 


Glu 


Thr 


ser 


Ser 


Glu 


Tyr 


Ser 




50 










55 










60 








Ser 


Leu 


Arg 


Phe 


Lya 


Trp 


Phe 


Lys 


Aan 


Gly 


Asn 


Glu 


Leu 


Asn 


Arg 


Lya 


65 










70 








75 








80 


Aan 


Lya 


Pro 


Gin 


Asn 


He 


Lys 


He 


Gin 


Lys 


Lys 


Pro Gly 


Lya 


Ser 


Glu 










85 










90 








95 




Leu 


Arg 


He 


Aan 


Lys 


Ala 


Ser 


Leu 


Ala 


Asp 


Ser 


Gly Glu 


Tyr 


Met 


Cye 








100 










105 










110 






Lya 


Val 


He 

115 


Sar 


Lya 


Leu 


Gly 


Aan 

120 


Aap 


Ser 


Ala 


Ser 


Ala 

125 


Aan 


He 


Thr 


He 


val 


Glu 


Ser 


Asn 


Glu 


He 


He 


Thr 


Gly 


Met 


Pro 


Ala 


Ser 


Thr 


Glu 




130 










135 








140 










Gly Ala 


Tyr 


Val 


Ser 


Ser 


Glu 


Ser 


Pro 


He 


Arg 


He 


Ser 


Val 


Ser 


Thr 


145 










150 










155 










160 


Glu Gly 


Ala 


Aan 


Thr 


Ser 


Ser 


Ser 


Thr 


Ser 


Thr 


Ser 


Thr Thr Gly Thr 










165 










170 










175 




Ser 


Hia 


Leu 


Val 

180 


Lys 


Cys 


Ala 


Glu 


Lya 
185 


Glu 


Lys 


Thr 


Phe 


Cya 

190 


Val 


Aan 
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Gly Gly Glu Cys Phe Met Val Lys Asp Leu Ser Abii Pro Ser Arg Tyr 

195 200 205 

Lau Cye Lye CyB Pro Asn Glu Phe Thr Gly Asp Arg Cys Gin Asn Tyr 

210 215 220 

Val Mat Ala Ser Phe Tyr Lys Hie Lgu Gly lie Glu Pho Mot Glu Ala 
225 230 235 240 

Glu Glu L«u Tyr Gin Lye Arg Val Leu Thr lie Thr Gly lie Cys lie 

245 250 255 

Ala Leu Lou Val Val Gly lie Met Cys Val Val Ala Tyr Cya Lys Thr 

260 265 270 

Lye Lys Gin Arg Lys Lys Leu His Aep Arg Leu Arg Gin Ser Leu Arg 

275 280 285 

Ser Glu Arg Asn Asn M«t Met Asn He Ala Asn Gly Pro His His Pro 

290 295 300 

Asn Pro Pro Pro Glu Asn Val Gin Leu Val Aon Gin Tyr Val Ser Lys 
305 310 315 320 

Asn Val He Ser Ser Glu His He Val Glu Arg Glu Ala Glu Thr Ser 

325 330 335 

Phe Ser Thr Ser His Tyr Thr Ser Thr Ala His His Ser Thr Thr Val 

340 345 350 

Thr Gin Thr Pro Ser His Ser Trp Ser Asn Gly His Thr Glu Ser He 

355 360 365 

Leu Ser Glu Ser Hio Ser Val He Val Met Ser Ser Val Glu Asn Ser 

370 375 380 

Arg His Ser Ser Pro Thr Gly Gly Pro Arg Gly Arg Leu Asn Gly Thr 
385 390 395 400 

Gly Gly Pro Arg Glu Cye Asn Ser Phe Leu Arg His Ala Arg Glu Thr 

405 410 415 

Pro Asp Ser Tyr Arg Asp Ser Pro His Ser Glu Arg Tyr Val Ser Ala 

420 425 430 

Mot Thr Thr Pro Ala Arg Mot Ser Pro Val Asp Phe His Thr Pro Ser 

435 440 445 

Sor Pro Lye Ser Pro Pro Ser Glu Met Ser Pro Pro Val Ser Ser Met 

450 455 460 

Thr Val Ser Met Pro Ser Met Ala Val Ser Pro Phe Mot Glu Glu Glu 
465 470 475 480 

Arg Pro Leu Leu Lou Val Thr Pro Pro Arg Leu Arg Glu Lys Lys Phe 

485 490 495 

Asp His His Pro Gin Gin Phe Ser Ser Phe His His Asn Pro Ala His 

500 505 510 

Asp Ser Asn Ser Leu Pro Ala Ser Pro Leu Arg He Val Glu Asp Glu 

515 520 525 

Glu Tyr Glu Thr Thr Gin Glu Tyr Glu Pro Ala Gin Glu Pro Val Lys 

530 535 540 

Lys Leu Ala Asn Ser Arg Arg Ala Lys Arg Thr Lya Pro Asn Gly His 
545 550 555 560 

He Ala Asn Arg Leu Glu Val Asp Ser Asn Thr Ser Ser Gin Ser Ser 

565 570 575 

Asn Sor Glu Ser Glu Thr Glu Asp Glu Arg Val Gly Glu Asp Thr Pro 

580 585 590 

Pho Lou Gly lie Gin Asn Pro Leu Ala Ala Ser Leu Glu Ala Thr Pro 

595 600 605 

Ala Phe Arg Leu Ala Asp Ser Arg Thr Asn Pro Ala Gly Arg Phe Ser 

610 615 620 

Thr Gin Glu Glu He Gin Ala Arg Lou Ser Ser Val He Ala Asn Gin 
625 630 635 640 

Asp Pro He Ala Val 
645 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS! 

(A) LENGTH! 47 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: peptide 
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(Xl) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 

Gly Hia Ala Arg Lys CyB Aan Glu Thr Ala Lys Ser Tyr Cya Val Asn 

15 10 15 

Gly Gly Val Cye Tyr Tyr He Glu Gly He Asn Gin Leu Ser Cya Lye 

20 25 30 

Cya Pro Aan Gly Phe Phe Gly Gin Arg Cya Leu Glu Lye Leu Pro 
35 40 45 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acida 

(B) TYPE I amino acid 
(D) TOPOLOGY t linear 

(ii) MOLECULE TYPEt peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 

Ser Hie Leu Val Lya Cys Ala Glu Lys Glu Lys Thr Phe Cya Val Aan 

15 10 15 

Gly Gly Glu Cya Pha Met Val Lya Aap Leu Ser Aan Pro Ser Arg Tyr 

20 25 30 

Leu Cya Lya Cya Gin Pro Gly Phe Thr Gly Ala Arg Cya Thr Glu Aan 
35 40 45 

Val Pro 
50 



(2) INFORMATION FOR SEQ ID NO J 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acida 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(li) MOLECULE TYPE; peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

Ser Hie Leu He Lye Cye Ala Glu Lys Glu Lya Thr Phe Cya Val Aan 

15 10 15 

Gly Gly Glu Cys Phe Thr Val Lya Aap Leu Ser Asn Pro Ser Arg Tyr 

20 25 30 

Leu Cya Lya Cya Gin Pro Gly Phe Thr Gly Ala Arg Cya Thr Glu Aan 
35 40 45 

Val Pro 
50 

(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acida 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

Ser His Leu Val Lya Cya Ala Glu Lya Glu Lys Thr Phe Cys Val Aan 

15 10 15 

Gly Gly Glu Cya Phe Met Val Lya Aap Leu Ser Asn Pro Ser Arg Tyr 

20 25 30 

Leu Cya Lya Cya Pro Aan Glu Phe Thr Gly Aap Arg Cya Gin Aen Tyr 

35 40 45 
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Val Met 

50 



(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 amino acide 

(B) TYPE: amino acid 
(D) TOPOLOGY: linttar 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOil5: 

Ser HiB Leu Thr Lya Cys Asp lie Lys Gin Lya Ala Phe Cye Val Asn 

15 10 15 

Gly Gly Glu Cya Tyr Met Val Lys Aap Leu Pro Aen Pro Pro Arg Tyr 

20 25 30 

Leu Cya Arg Cya Pro Aan Glu Phe Thr Gly Asp Arg Cya Gin Aan Tyr 
35 40 45 

Val Met 

50 



(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS I 

(A) LENGTH: 46 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(Xi) SEQUENCE DESCRIPTION; SEQ ID NO: 16: 

Gly Lya Arg Asp Pro Cya Leu Arg Lya Tyr Lys Asp Phe Cye lie Hia 
15 10 15 

Gly Glu Cya Lya Tyr val Lya Glu Leu Arg Ala Pro Ser Cya lie Cya 

20 25 30 

Hia Pro Giy Tyr Hia Gly Glu Arg Cya Hia Gly Leu Ser Leu 
35 40 45 



(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 amino acida 

(B) TYPEt amino acid 
(D) TOPOLOGY 1 linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOil7i 

Aen Ser Aap Ser Glu Cya Pro Leu Ser Hia Aap Gly Tyr Cya Leu Hia 

15 10 15 

Aep Gly Val Cya Met Tyr lie Glu Ala Leu Aap Lya Tyr Ala Cya Aan 

20 25 30 

Cya Val Val Gly Tyr lie Gly Glu Arg Cya Gin Tyr Arg Aap Leu 
35 40 45 



(2) INFORMATION FOR SEQ ID NO: 18: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 46 amino acida 
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(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE* peptide 

(xi) SEQUENCE DESCRIPTIONj SEQ ID NO: 18: 

Lye Lys Lye Aan Pro Cye Aan Ala Glu Phe Gin Aan Phe Cys He His 

15 10 15 

Gly Glu Cye Lyo Tyr He Glu His Leu Glu Ala Val Thr Cye Lye Cvb 

20 25 30 

Gin Gin Glu Tyr Phe Gly Glu Arg Cys Gly Glu Lys Ser Met 
35 40 45 



(2) INFORMATION FOR SEQ ID NOtl9i 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 46 amino acida 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

Ser Hie Phe Aon Aap Cya Pro Aep Ser Hia Thr Gin Phe Cya Phe Hia 

15 10 15 

Gly Thr Cya Arg Phe Leu Val Gin Glu Aap Lys Pro Ala Cya Val Cya 

20 25 30 

Hia Ser Gly Tyr Val Gly Ala Arg Cya Glu Hia Ala Aap Leu 
35 40 45 



(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

Val Leu Thr He Thr Gly He Cys Val Ala Leu Leu Val Val Gly He 

IS 10 15 

Val Cya Val Val Ala Tyr Cya 
20 

(2) INFORMATION FOR SEQ ID NO(21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(ix) FEATURE: 

(A) NAME/KEY: Coding Sequence 

(B) LOCATION: 9... 9 

(D) OTHER INFORMATION: where Xaa at position 9 is 
Isoleucine or Valine 

(A) NAME/KEY: Other 

(B) LOCATION: 17... 17 

(D) OTHER INFORMATION: where Xaa at poaition 17 ia 
Methionine or Valine 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 

VaX Lou Thr He Thr Cly He Cya Xaa Ala Leu Leu Val Val Oly He 

15 10 15 

Xaa Cya Val Val Ala Tyr Cys 
20 

(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS t 
(A> LENGTH: 15 base pairs 

(B) TYPEt nucleic acid 

(C) 5TRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOt22: 
GACTTGGCTC TCTCC iS 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 15 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:23i 

GGACTCCGAC ATTCT 15 

(2) INFORMATION FOR SEQ ID NO:24i 

(i) SEQUENCE CHARACTERISTICS! 
<A> LENGTH: 21 amino acids 
(B> TYPE: amino acid 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: peptide 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 

Met Ala Leu Pro Val Thr Ala Leu Leu Leu Pro Leu Ala Leu Leu Leu 

15 10 15 

His Ala Ala Arg Pro 
20 

(2) INFORMATION FOR SEQ ID NO: 25: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: Single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 
AAAAAGAATT CCTCCATGTC AACAGCGTG 29 

(2) INFORMATION FOR SEQ ID NOi26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 33 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
TCCTCTCTCC ACTCACTTAG GATCTGGCAT GTA 33 



(2) INFORMATION FOR SEQ ID NO: 27; 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPEj nucleic acid 

(C) STRANDEDNESS t single 
<D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOi27: 

CACAGTCCAC CCCTCAG 17 



(2) INFORMATION FOR SEQ ID NO: 28: 

(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOt28: 
GCTCTGGTAA GCAAACATGC 20 



(2) INFORMATION FOR SEQ ID NO: 29: 

<i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NOi29: 
TGTGAACTCC TCTCGCCTGT 20 



(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 
<A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION; SEQ ID NO!30: 
GAAGCGGCTO GGCATTTAAT 20 



(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 2268 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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<ii) MOLECULE TYPE; cDNA 

<ix) FEATURE: 

(A) NAME /KEY: Coding Sequence 

(B) LOCATIONi 69... 2009 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

CGGGCGGCGG GGGCGCAGCG CGGCAGCGGA GAGCTGAGGC CGTCCCACCG CCTGGGACCC 60 

CCTGCAGA ATG TCG GAG TCC AAG AGG AGG GGC CGC CGC CGC OCC AAG AAG 110 
Met Ser Glu Ser Lya Arg Arg Gly Arg Gly Arg Gly Lys Lys 

15 10 

CAC CCA GAG GGG AGG AAG CGG GAG AGG GAG CCC GAT CCC CCG GAG AAA 158 
Hill Pro Glu Gly Arg Lys Arg Glu Arg Glu Pro Aap Pro Gly Glu Lya 
15 20 25 30 

GCC ACC CGG CCC AAG TTG AAG AAG ATG AAG AGC CAG ACG GGA CAG GTG 206 
Ala Thr Arg Pro Lye Leu Lya Lya Met Lys Ser Gin Thr Gly Gin Val 
35 40 45 

CGT GAG AAG CAA TCG CTG AAG TGT GAG GCA GCA GCC GOT AAT CCC CAG 254 
Gly Glu Lye Gin Sor Lttu Lys Cys Glu Ala Ala Ala Gly Asn Pro Gin 
50 55 60 

CCT TCC TAC CGT TGG TTC AAG GAT GGC AAG GAG CTC AAC CGC AGC CGA 302 
Pro Ser Tyr Arg Trp Phe Lya Aap Gly Lys Glu Leu Aan Arg Ser Arg 
65 70 75 

GAC ATT CGC ATC AAA TAT GGC AAC GGC AGA AAG AAC TCA CGA CTA CAG 350 
Aap lie Arg lie Lya Tyr Gly Aan Gly Arg Lya Asn Ser Arg Leu Gin 
80 85 90 

TTC AAC AAG GTG AAG GTG GAG GAC GCT GGG GAG TAT GTC TGC GAG GCC 398 
Phe Asn Lys Val Lys Val Glu Asp Ala Gly Glu Tyr Val Cys Glu Ala 
95 100 105 110 

CAC AAC ATC CTG GGG AAG GAC ACC GTC CGG GGC CGG CTT TAC GTC AAC 446 
Glu Aan He Leu Gly Lys Aap Thr val Arg Gly Arg Leu Tyr Val Aan 
115 120 125 

AGC GTC AGC ACC ACC CTG TCA TCC TGG TCG GGG CAC GCC CCC AAG TGC 494 
Ser Val Ser Thr Thr Leu Ser Ser Trp Ser Gly His Ala Arg Lya Cys 
130 135 140 

AAC GAG ACA GCC AAG TCC TAT TGC GTC AAT GCA GGC GTC TGC TAC TAC 542 
Asn Glu Thr Ala Lya Sor Tyr Cys Val Aan Gly Gly Val Cya Tyr Tyr 
145 ISO 155 

ATC GAG GGC ATC AAC CAG CTC TCC TGC AAA TGT CCA AAT GGA TTC TTC 590 
He Glu Gly He Asn Gin Leu Ser Cys Lys Cya Pro Aan Gly Phe Phe 
160 165 170 

GCA CAG AGA TGT TTG GAG AAA CTG CCT TTC CGA TTG TAC ATG CCA GAT 638 
Gly Gin Arg Cya Leu Glu Lya Leu Pro Leu Arg Leu Tyr Met Pro Asp 
175 180 185 190 

CCT AAC CAA AAA GCC GAG GAG CTG TAC CAG AAG AGG GTC CTG ACC ATC 686 
Pro Lys Gin Lys Ala Glu Glo Leu Tyr Gin Lya Arg Val Leu Thr He 
195 200 205 

ACG GGC ATC TOC GTG CCT CTC CTG GTC GTG GGC ATC GTC TGT GTG GTG 734 
Thr Gly He Cya Val Ala Leu Leu Val Val Gly He Val Cya Val Val 
210 215 220 

GCC TAC TGC AAG ACC AAA AAA CAG CGG AAG CAG ATG CAC AAC CAC CTC 7B2 
Ala Tyr Cys Lya Thr Lys Lys Gin Arg Lya Gin Met His Aen His Leu 
225 230 235 
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CGC CAG AAC ATG TGC CCG GCC CAT CAG AAC CGG AGC TTG GCC AAT GCG 830 
Arg Gin Aan Met Cys Pro Ala Hia Gin Asn Arg Ser Leu Ala Asn Gly 
240 245 250 

CCC AGC CAC CCC CGG CTG GAC CCA GAG GAG ATG CAG ATG GCA GAT TAT 878 
Pro Smr Him Pro Arg Leu Asp Pro Glu Glu Met Gin Met Ala Asp Tyr 
255 260 265 270 

ATT TCC AAG AAC GTC CCA GCC ACA CAC CAT GTC ATC AGG AGA GAA ACT 92 6 

lie Ser Lya Aan Val Pro Ala Thr Asp His Val lie Arg Arg Glu Thr 
275 260 285 

GAG ACC ACC TTC TOT GGG AGC CAC TCC TGT TCT CCT TCT CAC CAC TGC 974 
Olu Thr Thr Phe Ser Gly Ser His Ser Cye Ser Pro Ser Hia Hia Cya 
290 295 300 

TCC ACA GCC ACA CCC ACC TCC AGC CAC AGA CAC GAG AGC CAC ACG TGG 1022 
Ser Thr Ala Thr Pro Thr Ser Thr His Arg Hia Glu Ser Hie Thr Trp 
305 310 315 

AGC CTG GAA CGT TCT GAG AGC CTG ACT TCT GAC TCC CAG TCG GGG ATC 1070 
Ser Leu Glu Arg Ser Glu ser Leu Thr ser Aap ser Gin Ser Gly lie 
320 325 330 

ATO CTA TCA TCA GTG GGT ACC AGC AAA TGC AAC AGC CCA GCA TGT CTG 1118 
Met Leu Ser Ser Val Gly Thr Ser Lya Cya Aan Ser Pro Ala Cya Val 
335 340 345 350 

GAG GCC CGG GCA AGG CGG GCA GCA GCC TAC AAC CTG GAG GAG CGG CGC 1166 
Glu Ala Arg Ala Arg Arg Ala Ala Ala Tyr Aen Leu Glu Glu Arg Arg 
355 360 365 

AGG CCC ACC GCG CCA CCC TAT CAC GAT TCC GTG GAC TCC CTT CGC GAC 1214 
Arg Ala Thr Ala Pro Pro Tyr Hia Aap Ser Val Aap Ser Leu Arg Aap 
370 375 380 

TCC CCA CAC AGC GAG AGG TAC GTG TCG GCC CTG ACC ACG CCC GCG CGC 1262 
Ser Pro Hia Ser Glu Arg Tyr Val Ser Ala Leu Thr Thr Pro Ala Arg 
385 390 395 

CTC TCC CCC CTG GAC TTC CAC TAC TCG CTG GCC ACG CAG GTG CCA ACT 1310 
Leu Ser Pro Val Aap Phe Hia Tyr Ser Leu Ala Thr Gin Val Pro Thr 
400 405 410 

TTC GAG ATC ACG TCC CCC AAC TCG CCG CAC GCC GTG TCC CTG CCG CCG 1358 
Phe Glu lie Thr Ser Pro Aan Ser Ala Hie Ala Val Ser Leu Pro Pro 
415 420 425 430 

GCG GCG CCC ATC AGT TAC CGC CTG GCC GAG CAG CAG CCG TTA CTG CGC 1406 
Ala Ala Pro lie Ser Tyr Arg Leu Ala Glu Gin Gin Pro Leu Leu Arg 
435 440 445 

CAC CCG GCG CCC CCC GGC CCC GGA CCC GGA CCC GGG CCC GGG CCC GGG 1454 
Hia Pro Ala Pro Pro Gly Pro Gly Pro Gly Pro Gly Pro Gly Pro Gly 
450 455 460 

CCC GGC GCA GAC ATG CAG CGC AGC TAT GAC AGC TAC TAT TAC CCC GCG 1502 
Pro Gly Ala Aap Met Gin Arg Ser Tyr Aap Ser Tyr Tyr Tyr Pro Ala 
465 470 475 

GCG GCG CCC GGA CCG CGG CCC GGG ACC TGC GCG CTC GGC GGC AGC CTG 1550 
Ala Gly Pro Gly Pro Arg Arg Gly Thr Cya Ala Leu Gly Gly Ser Leu 
480 485 490 

GCC ACC CTC CCT GCC AGC CCC TTC CGC ATC CCC GAG GAC GAC GAG TAC 1598 
Gly Ser Leu Pro Ala Ser Pro Phe Arg lie Pro Glu Asp Aap Glu Tyr 
495 500 505 510 
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GAG ACC ACG CAG GAG TGC GCG CCC CCG CCG CCG CCG CGG CCG CGC GOG 1646 

Glu Thr Thr Gin Glu Cys Ala Pro Pro Pro Pro Pro Arg Pro Arg Ala 
515 520 525 

CGC GGT GCG TCC CGC AGG ACG TCG GCG GGG CCC CGG CGC TGG CGC CGC 1694 
Arg Gly Ala Ser Arg Arg Thr Set Ala Gly Pro Arg Arg Trp Arg Arg 
530 535 540 

TCG CGC CTC AAC GGG CTG GCG GCG CAG CGC GCA CGG GCG GCG AGG GAC 1742 
Ser Arg Lmu Aan Gly Leu Ala Ala Gin Arg Ala Arg Ala Ala Arg Asp 
545 550 555 

TCC CTG TCG CTG AGC ACC GGC TCG GGC GGC GGC TCA GCC TCG GCG TCG 1790 
Ser I*eu Ser Leu Ser Ser Gly Ser Gly Gly Gly Ser Ala Ser Ala Ser 
560 565 570 

GAC GAC GAC GCG GAC GAC GCG GAC GGG GCG CTG GCG GCC GAG AGC ACA 1338 
Aep Asp Aep Ala Asp Asp Ala Aep Gly Ala Leu Ala Ala Glu Ser Thr 
575 580 585 590 

CCT TTC CTG GGC CTG CGT GGG GCG CAC GAC GCG CTG CGC TCG GAC TCC 1886 
Pro Phe Leu Gly Leu Arg Gly Ala His Asp Ala Leu Arg Ser Asp Ser 
595 600 60S 

CCG CCA CTG TGC CCC GCG GCC GAC AGC ACG ACT TAC TAC TCA CTG GAC 1934 
Pro Pro Leu Cys Pro Ala Ala Asp Ser Arg Thr Tyr Tyr Ser Leu Asp 
610 615 620 

AGC CAC AGC ACG CGG GCC AGC ACC AGA CAC AGC CGC GGG CCG CCC CCG 1982 
Ser Hie Ser Thr Arg Ala Ser Ser Arg His Ser Arg Gly Pro Pro Pro 
625 630 635 

CGG GCC AAO CAG GAC TCG GCG CCA CTC TAGGGCCCCG CCGCGCGCCC CTCCOCC 2036 
Arg Ala Lys Gin Asp Ser Ala Pro Leu 
640 645 

CCGCCOCCCC CACTATCTTT AAGCAGACCA GAGACCCCCT ACTGGAOAOA AAGGAGCAAA 2096 

AAAOAAATAA AAATATTTTT ATTTTCTATA AAAGGAAAAA AGTATAACAA AATGTTTTAT 2156 

TTTCATTTTA GCAAAAATTG TCTTATAATA CTAGCTAACG GCAAAGGCGT TTTTATAOGG 2216 

AAACTATTTA TATGTAACAT CCTGATTTAC AGCTTCGGAA AAAAAAAAGA AA 2268 



(2) IKFORMATION FOR SEQ ID NOx32i 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH 1 647 amino acids 

(B) TYPE: amino acid 
(D) TOPOLCWY: linear 

(ii) MOLECULE TYPEt protein 
{V) FRAGMENT TYPE i internal 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 



Met 


Ser 


Glu 


Ser 


Lys 


Arg 


Arg 


Gly 


Arg 


Gly 


Arg 


Gly 


Lys 


Lys 


His 


Pro 


1 








5 










10 










15 




Glu 


Gly 


Arg 


Lya 

20 


Arg 


Glu 


Arg 


Glu 


Pro 
25 


Asp 


Pro 


Gly 


Glu 


Lys 

30 


Ala 


Thr 


Arg 


Pro 


Lys 


Leu 


Lys 


Lys 


Met 


Lys 


Ser 


Gin 


Thr 


Gly 


Gin 


Val 


Gly Glu 






35 










40 










45 








Lys 


Gin 


Ser 


Leu 


Lys 


Cys 


Glu 


Ala 


Ala 


Ala 


Gly 


Asn 


Pro 


Gin 


Pro 


ser 




50 






55 








60 










Tyr 


Arg 


Trp 


Phe 


Lys 


Asp 


Gly 


Lys 


Glu 


Leu 


Asn 


Arg 


ser 


Arg 


Asp 


He 


65 










70 










75 










80 


Arg 


lie 


Lys 


Tyr 


Gly 
85 


Asn 


Gly 


Arg 


Lys 


Asn 

90 


Ser 


Arg 


Leu 


Oln 


Phe 

95 


Asn 


Lys 


Val 


Lys 


Val 

100 


Glu 


Aep 


Ala 


Gly 


Glu 
105 


Tyr 


Val 


Cys 


Glu 


Ala 

110 


Glu 


Asn 
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Ile Leu Gly Lya Aep Thr Val Arg Gly Arg Leu Tyr Val Aen Ser Val 

115 120 125 

Ser Thr Thr Leu Ser Ser Trp Ser Gly Hie Ala Arg Lya Cya Aan Glu 

130 135 140 

Thr Al« Lya Ser Tyr Cya Val Aan Gly Gly Val Cya Tyr Tyr lie Olu 
1*5 150 155 160 

Oly He Aan Gin Leu Ser Cya Lya Cya Pro Aan Gly Phe Phe Gly Gin 

165 170 175 

Arg Cya Leu Glu Lya Leu Pro Leu Arg Leu Tyr Met Pro Aap Pro Lya 

180 185 190 

Gin Lya Ala Glu Glu Leu Tyr Gin Lya Arg Val Leu Thr He Thr Gly 

195 200 205 

He Cya Val Ala Leu Leu Val Val Gly He Val Cya Val Val Ala Tyr 

210 215 220 

Cya Lya Thr Lya Lye Gin Arg Lya Gin Met Hia Aan Hia Leu Arg Gin 
225 230 235 240 

Aan Met Cya Pro Ala Hia Gin Aan Arg Ser Leu Ala Aan Gly Pro Ser 

245 250 255 

Hia Pro Arg Leu Asp Pro Glu Glu Met Gin Mot Ala Aap Tyr He Ser 

260 265 270 

Lya Aan Val Pro Ala Thr Aap Hia Val He Arg Arg Glu Thr Glu Thr 

275 280 285 

Thr Phe Ser Gly Ser Hia Ser Cye Ser Pro Ser Hia Hia Cya Ser Thr 

290 295 300 

Ala Thr Pro Thr Ser Thr Hia Arg Hia Glu Ser Hia Thr Trp Ser Leu 
305 310 315 320 

Glu Arg Ser Glu ser Leu Thr Ser Asp Ser Gin Ser Gly He Met Leu 

325 330 335 

Ser Ser Val Gly Thr Ser Lya Cya Aan Ser Pro Ala Cya Val Glu Ala 

340 345 350 

Arg Ala Arg Arg Ala Ala Ala Tyr Aan Leu Glu Glu Arg Arg Arg Ala 

355 360 365 

Thr Ala Pro Pro Tyr Hia Aap Ser Val Aap Ser Leu Arg Aap Ser Pro 

370 375 380 

Hia Ser Glu Arg Tyr Val Ser Ala Leu Thr Thr Pro Ala Arg Leu Ser 
385 390 395 400 

Pro Val Aap Phe Hie Tyr Ser Leu Ala Thr Gin Val Pro Thr Phe Glu 

405 410 415 

He Thr Ser Pro Aan Ser Ala Hia Ala Val Ser Leu Pro Pro Ala Ala 

420 425 430 

Pro He Ser Tyr Arg Leu Ala Glu Gin Gin Pro Leu Leu Arg Hia Pro 

435 440 445 

Ala Pro Pro Gly Pro Gly Pro Gly Pro Gly Pro Gly Pro Gly Pro Gly 

450 455 460 

Ala Aap Met Gin Arg Ser Tyr Aap Ser Tyr Tyr Tyr Pro Ala Ala Oly 
465 470 475 480 

Pro Gly Pro Arg Arg Gly Thr Cya Ala Leu Gly Gly Ser Leu Gly Ser 

485 490 495 

Leu Pro Ala Ser Pro Phe Arg He Pro Glu Aap Aap Glu Tyr Glu Thr 

500 505 510 

Thr Gin Glu Cya Ala Pro Pro Pro Pro Pro Arg Pro Arg Ala Arg Gly 

515 520 525 

Ala Ser Arg Arg Thr Ser Ala Gly Pro Arg Arg Trp Arg Arg Ser Arg 

530 535 540 

Leu Aan Gly Leu Ala Ala Gin Arg Ala Arg Ala Ala Arg Aap Ser Leu 
545 550 555 560 

Ser Leu Ser Ser Gly Ser Gly Gly Gly Ser Ala Ser Ala Ser Aap Aap 

565 570 575 

Aap Ala Aap Aap Ala Aap Gly Ala Leu Ala Ala Glu Ser Thr Pro Phe 

580 585 590 

Leu Gly Leu Arg Gly Ala His Aap Ala Leu Arg Ser Aap Ser Pro Pro 

595 600 605 

Leu Cya Pro Ala Ala Aap Ser Arg Thr Tyr Tyr Ser Leu Aap Ser Hia 

610 615 620 

Ser Thr Arg Ala Ser Ser Arg Hia Ser Arg Gly Pro Pro Pro Arg Ala 
625 630 635 640 

Lya Gin Aap Ser Ala Pro Leu 
645 
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(2) INFORMATION FOR SEQ ID NO: 33: 

(1) SEQUKNCE CHARACTERISTICS: 
<A) LENGTH: 139 amino acids 
(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 





(xi) SEQUENCE 


DESCRIPTION: SEQ ID 


NO: 33: 
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Pro 
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Trp 
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Thr 
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Val 


Ser 


Ser 


Ser 


Gin 
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Thr 
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Leu 
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135 
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What is claimed is: 

1. An isolated nucleic acid encoding a Don-1 
polypeptide. 

2. An isolated nucleic acid of claim 1, wherein 

5 the nucleic acid encodes an amino acid sequence of SEQ ID 
N0:2, 4, 6, 8, or 32. 

3- A nucleic acid of claim 1, wherein said 
nucleic acid encodes a soluble Don-1 polypeptide* 

4. A nucleic acid of claim 1, wherein said 

10 nucleic acid comprises the nucleotide sequence of SEQ ID 
NO:l, 3, 5, 7, or 31, 

5. A nucleic acid of claim 1, wherein said 
nucleic acid encodes the epidermal growth factor (EGF) 
domain of Don-1 having SEQ ID NO: 11. 

15 6. A nucleic acid of claim 1, wherein said 

nucleic acid encodes the extracellular domain of Don-1. 

7. A nucleic acid encoding a hybrid polypeptide, 
said hybrid polypeptide comprising a first portion and a 
second portion, said first portion comprising a Don-1 

20 polypeptide and said second portion comprising an 
immunoglobulin constant (Fc) region. 

8. A nucleic acid of claim 1, wherein the first 
portion comprises the epidermal growth factor (EGF) 
domain of Don-1. 

25 9. A nucleic acid of claim 1 encoding the amino 

acid sequence of the Ig domain of Don-1. 
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10. A nucleic acid of claim 1 encoding the amino 
acid sequence of the transmembrane (TM) domain of Don-1, 

11. An isolated nucleic acid of claim 1 
comprising the nucleotide sequence of the don-1 gene 

5 contained in A,T, C-C. deposit 98096, 98097, or 98098. 

12. An isolated nucleic acid of claim 1 that 
hybridizes to the nucleotide sequence of SEQ ID NO:l, 3, 
5, 7, or 31 or its complement. 

13. An isolated nucleic acid of claim 12, wherein 
10 the nucleic acid encodes a polypeptide that activates 

receptor-type tyrosine kinases that have a molecular 
weight of about 185 kDa. 

14. An isolated nucleic acid of claim 1 that 
hybridizes to the nucleotide sequence of the don~l gene 

15 contained in A.T.C.C^ deposit 98096, 98097, or 98098. 

15. An isolated nucleic acid of claim 14, wherein 
the nucleic acid encodes a polypeptide that activates 
receptor-type tyrosine kinases that have a molecular 
weight of about 185 kDa. 

20 16. An isolated nucleic acid of claim 1 that 

hybridizes to the nucleotide sequence of the 
transmembrane (TM) domain of the don-1 gene, wherein the 
isolated nucleic acid encodes a polypeptide that 
activates receptor-type tyrosine kinases that have a 

25 molecular weight of about 185 kDa. 

17. An isolated nucleic acid of claim 1 that 
hybridizes to the nucleotide sequence of the epidermal 
growth factor (EGF) domain of the don-l gene, wherein the 
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isolated nucleic acid encodes a polypeptide that 
activates receptor-type tyrosine kinases that have a 
molecular weight of about 185 kDa, 

18. A host cell comprising the nucleic acid of 

5 claim 1. 

19. A nucleic acid vector comprising the nucleic 
acid of claim 1. 

20. A nucleic acid vector of claim 19, wherein 
the vector is an expression vector. 

10 21. A substantially pure Don-1 polypeptide. 

22. A substantially pure polypeptide of claim 21, 
wherein said polypeptide is soluble. 

23. A polypeptide of claim 21, wherein said 
polypeptide comprises the epidermal growth factor (EGF) 

15 domain of Don-1, 

24. A polypeptide of claim 21, wherein said 
polypeptide comprises the extracellular domain of Don-1. 

25. A polypeptide of claim 21, wherein said 
polypeptide comprises the amino acid seguence of SEQ ID 

20 N0:2, 4, 6, 8, or 32. 

26. A polypeptide of claim 21, wherein said 
polypeptide is encoded by the nucleic acid seguence of 
SEQ ID N0:1, 3, 5, 7, or 31. 
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27. A polypeptide of claim 21, wherein said 
polypeptide is encoded by the don-1 gene contained in 
A.T.C.C. deposit 98096, 98097, or 98098. 

28. A substantially pure polypeptide of claim 21, 
5 wherein the polypeptide is at least 80% identical to the 

amino acid sequence of the epidermal growth factor (EGF) 
domain of Don-1. 

29. The polypeptide of claim 28, wherein the EGF 
domain has the sequence of SEQ ID NO: 11. 

10 30. A substantially pure polypeptide of claim 1, 

wherein the polypeptide is at least 80% identical to the 
amino acid sequence of the Ig domain of Don-1. 

31. The polypeptide of claim 30, wherein the Ig 
domain extends from about amino acid 16 to about amino 

15 acid 7 0 in SEQ ID NO: 2, 4, or 6, or from about amino acid 
54 to about amino acid 108 in SEQ ID N0s:8 and 32. 

32. A substantially pure polypeptide of claim 1, 
wherein the polypeptide is at least 90% identical to the 
amino acid sequence of the transmembrane (TM) domain of 

20 Don-1. 

33. The polypeptide of claim 32, wherein the TM 
domain has the sequence of SEQ ID NO: 20. 

34. A substantially pure polypeptide comprising a 
first portion and a second portion, said first portion 

25 comprising a Don-1 polypeptide and said second portion 
comprising an immunoglobulin constant (Fc) region or a 
detectable marker. 



wo 98/07736 



PCT/US97/14585 



- 102 - 

35. An antibody that specifically binds to a Don- 
1 polypeptide, 

36. A pharmaceutical composition comprising a 
polypeptide of claim 21. 

5 37. A method for detecting Don-l in a sample, the 

method compr i s ing : 

obtaining a biological sample; 

contacting the sample with an anti-Don-1 antibody 
of claim 35 under conditions that allow the formation of 
10 Don-l-antibody complexes; and 

detecting the complexes, if any, as an indication 
of the presence of Don-1 in the biological sample* 

38. A method for stimulating proliferation of a 
cell, the method comprising administering to the cell an 
15 amount of a Don-1 polypeptide effective to stimulate 
proliferation of the cell. 



39. A method for decreasing proliferation of a 
cell, the method comprising administering to the cell an 
amount of a Don-1 polypeptide inhibitor effective to 

20 decrease proliferation of the cell. 

40. A method of claim 39, wherein said inhibitor 
is an antibody that selectively binds to Don-1. 

41. A method of obtaining a splice variant cDNA 
of the don-1 gene, the method comprising 

25 obtaining a labeled probe comprising an isolated 

nucleic acid that encodes all or a portion of the 
epidermal growth factor (EGF) domain of Don-1; 

screening a nucleic acid fragment library with the 
labeled probe under conditions that allow hybridization 
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of the probe to nucleic acid fragments in the library to 
form nucleic acid duplexes; 

isolating labeled duplexes, if any; and 
preparing a full-length cDNA from the fragments in 
5 any labeled duplex to obtain a splice variant cDNA of the 
don-l gene. 

42. A method of claim 41, wherein the EGF domain 
has the amino acid sequence of SEQ ID NO: 11. 

43. A method of obtaining a gene related to the 
10 do/5-1 gene, the method comprising 

obtaining a labeled probe comprising an isolated 
nucleic acid that encodes all or a portion of the 
transmembrane (TM) domain of Don-l; 

screening a nucleic acid fragment library with the 
15 labeled probe under conditions that allow hybridization 
of the probe to nucleic acid fragments in the library to 
form nucleic acid duplexes; 

isolating labeled duplexes, if any; and 

preparing a full-length gene sequence from the 
20 nucleic acid fragments in any labeled duplex to obtain a 
gene related to the don- J gene. 

44. A method of claim 43, wherein the TM domain 
has the amino acid sequence of SEQ ID NO: 20. 
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CAT CAC XCC CAC ACS TCC XOC CTO GAA 



car TCA GAC ACC CTC AC= TC5 CAT TCC CAC TCA 



C5C ATC ATC CTA TCA TCA GTA CCC ACC XCC AAC TCC AAC ACC CCA CCA ... 



SARRAAAYSQi-**-^*^ ^ 
CCC CCC CCC AfiC CCA CCA CCC TAC ACC CAC CAC CAC C3G CCC ACC CCT CCC ATC CCA CC= 



HDSIDSLRDS? 



TAC CAT GAC TCC ATA CAC TC 



CCrr CAC TCT CCA CAC act CAA ACC TAC CTC TCA CC: 



S6 



L4 



CACCCCCCCCAACCTCAACAAC ATC AAC ACC CAC ACA CCA GAC CTC CCT CAC AAC CAC TCC CTC 1^35 

KCCAAACNPQPSrHWrJCDCK 
AAC TCT CAC CCA CCC CCC CCA AAC CCC CAC CCC TCC TAT CCC TCC TTC AAC GAT CCC AAC 

ri:fasRD:R:Jcrcyv-i:<:^sR s< 

CAA CTC AAC CCC ACT CCT GAT ATT CCC ATC AAC TAT CCC AAT CTC ACA AAC AAC TCA CCC 340 

CTA CAC TTC AAC AAA CTC ACC CTC GAG CAT CCC CCC SAG TAC CTC TST CAC CCC CAC AAC Iqq 

r L c !«: 3 T ?- c 1 v s s V s T r L 94 

ATC err CCC XJJZ CAC ACC CTG ACC CCC CCA CTC cat CTC AAC ACC CTC ACC ACC ACT CTC ISO 

S S W S C H A X C >^ C T A K S Y C V fJ Li< 

TCA TCC TCC TCC CCA CAT CCC CCC AAC TCC AAT CAC ACC CCC AAC TCC TAC TCT CTC AAT 4iC 

ccvcYY:3:G:^^Q*-scKC?^^c 134 

CCA CCC CTC TCC TAC TAC ATC CAC CCC ATC AAC CAC CTC TCC TCC AAA TCT CCA AAC CCA 410 

TTC TTC CCA C\C ACA TCT TTC GAC AAA CTC err TTC CCA TTC TAC ATC CCA CAT CCT AAC 540 



qkaze Lrq7:avLTCTC3rc a 17 4 

CAA AAC CCT GAC CAC CTC TAC CAC AAC AGA CTC CTC ACA ATT ACT CCT ATC TCT CTC CCC <00 

Lt-VVC I VcVVAYC K T X X Q R 19 ^ 

CTC CTC CTC CTC CCC ATC CTC TCT CTC CTC CCC TAC TCC AAC ACC AAA AAA CAC ACC ACC SCO 

QMHH"'' :iQJfMC?AKQ:J-l.SLA 214 

CAC ATC CAT CAT CAT CTC CCC CAC AAC ATC TCC CCA CCC CAC CAC AAC CCA ACC CTC CCC 720 

V C ? S 'A ? L 0 ? Z Z 1 Q ^ ^ 0 J 1 S 234 

AAC CCC CCC ACC CAC CCT CCC CTC CAC CCT GAC CAG ATC CAC ATC CCA CAT TAC ATC TC^ 7*0 

!CNV?ATOHVrR.^ = A = TT-SC 254 

AAA AAT CTC CCA CCT ACA CAC CAC CTC ATC CCC ACC CAA CCT CAC ACC ACS TTC TCT CCC «4C 

S H S C S ? S H K C S T A T ? T S S H ?, 274 

ACC CAC TCC TCT TCA C^ TCT CAC CAC TCC TCC ACA GCC ACC CCC ACC TCC ACC CAC AfiA 900 

H!:SK TWS:.I.^S^S:-TSDSCIS 29* 
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c:xLSsvcTSXc.^s?ACvrA 3ii 



GAC CCA 1030 
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lOftO 



aS£RYVSA 354 
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AATAAAAArArrrrJATTrTCTATAAAACCAAAAAACTATAACAAXATCTrrrXTTTTC^^ 

CrrATAATACTACCTAACCCCAAACACCTTTrrATAfWGAAACTATTTAT 

;>JUUkAAAAACAAACAACAAAAAAAAAAAAAAAAAAAAeT^ 

CCACATCCCCCrrrCCCCACCTCCCCTAATACCCAAAACCCCCCCACCCATCCCCCTTCC^ 
ATCCCCAATCCCAAATTCTAAtSCCTTAATATTTTCTTAAAATrCCCCrrAAAT^ 
CAATACCCCCAAATCCCC (^^eZ> •* t\ 
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ACCACGCTGACCACCTCTACCACAACACACTCCTCACAATTACTCCTA'ir. 'J^ J- 1 IWCC, . w^ l^i^ ..CSCATCCT 

eTCTCT G^ j Q ^STAeTCaUUSACCAAAAAACAgACgAfiCCAfia t CCATCATCATrrC^ 
CACCACAACCSAACCCTOT:CAACCGGCCCACCCACCCTCCCCTa=ACCCTCA£^^ 
CCAAAAATCTCCCACCTACACACCACGTCATrCGCACSSAAflCTCACACCACCT^^ 
TTCTCACCACTGCTCCULCAfiCCACCKXCACCTCCACCCACJUrACATt^^ 

ACCCTtlXCCrCCGATTCCCACTCACCWlTCXTCCTXTCATCACTACCCACCACCAACTGCAACACCCCAC^ 
ACCCACCCCCCCCGAGCOCACCACCCTACACCCACGACGJUXMCCC^^ 

AGAgr CG CT C C S T C ACTCTCCACACArTrGAAACqTACCTgrCASCCTTGACCACeCCCCCTCGCrTC^ 
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FIGURE 4 (continued) 
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